0 votes
1 view
in AI and Deep Learning by (20.5k points)

My professor asked my class to make a neural network to try to predict if breast cancer is benign or malignant. To do this I'm using the Breast Cancer Wisconsin (Diagnostic) Data Set.

As a tip for doing this my professor said not all 30 attributes needs to be used as an input (there are 32, but the first 2 are the ID and Diagnosis), what I want to ask is: How am I supposed to take those 30 inputs (that would create like 100+ weights depending on how many neurons I would use) and get them into a lesser number?

I've already found how to "prune" a neural net, but I don't think that's what I want. I'm not trying to eliminate unnecessary neurons but to shrink the input itself.

PS: Sorry for any English errors, it's not my native language.

1 Answer

0 votes
by (45.1k points)

Your problem is known as called feature selection and there are some techniques already. One is Principal Component Analysis (PCA) that reduces the dimensionality of your dataset taking those features that keep the most variance. Another thing you can do is to see if there are highly correlated variables. If two inputs are highly correlated may mean that they carry almost the same information so it may be removed without worsening much the performance of your classifier. A third method you could use is deep-learning which is a technique that tries to learn the features that will later be used to feed your trainer. 

There is a procedure for feature selection using just neural networks

Split your dataset into three groups:

  • Training data used for supervised training

  • Validation data which was used to verify that the neural network can generalize

  • Accuracy testing was used to test which of the features are required

The steps:

  • Train a network on your training and validation set, just like you would usually do.

  • Examine the accuracy of the network with the third dataset.

  • Locate the variable which yields the smallest drop in the accuracy test above when dropped (dropped meaning always feeding a zero as the input signal )

  • Retrain your network with the help of a new selection of features

  • Keep doing this either till the network fails to be trained or there is just one variable left.

...