Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (19k points)

I have the following data

         feat_1    feat_2 ... feat_n   label

gene_1   100.33     10.2  ... 90.23    great

gene_2   13.32      87.9  ... 77.18    soso

....

gene_m   213.32     63.2  ... 12.23    quitegood

The size of M is large ~30K rows, and N is much smaller ~10 columns. My question is what is the appropriate Deep Learning structure to learn and test the data like above.

At the end of the day, the user will give a vector of genes with expression.

gene_1   989.00

gene_2   77.10

...

gene_N   100.10

And the system will label which label does each gene apply e.g. great or soso, etc...

By structure I mean one of these:

  • Convolutional Neural Network (CNN)
  • Autoencoder
  • Deep Belief Network (DBN)
  • Restricted Boltzman Machine

1 Answer

0 votes
by (33.1k points)

You might want to see a bit more about how these networks work. 

For example:

import numpy as np

from sklearn import preprocessing

from keras.models import Sequential

from keras.layers.core import Dense, Activation, Dropout

# Create some random data

np.random.seed(42)

X = np.random.random((10, 50))

# Similar labels

labels = ['good', 'bad', 'soso', 'amazeballs', 'good']

labels += labels

labels = np.array(labels)

np.random.shuffle(labels)

# Change the labels to the required format

numericalLabels = preprocessing.LabelEncoder().fit_transform(labels)

numericalLabels = numericalLabels.reshape(-1, 1)

y =  preprocessing.OneHotEncoder(sparse=False).fit_transform(numericalLabels)

# Simple Keras model builder

def buildModel(nFeatures, nClasses, nLayers=3, nNeurons=10, dropout=0.2):

    model = Sequential()

    model.add(Dense(nNeurons, input_dim=nFeatures))

    model.add(Activation('sigmoid'))

    model.add(Dropout(dropout))

    for i in xrange(nLayers-1):

        model.add(Dense(nNeurons))

        model.add(Activation('sigmoid'))

        model.add(Dropout(dropout))

    model.add(Dense(nClasses))

    model.add(Activation('softmax'))

    model.compile(loss='categorical_crossentropy', optimizer='sgd')

    return model

for nLayers in xrange(2, 4):

    for nNeurons in xrange(5, 8):

        model = buildModel(X.shape[1], y.shape[1],              nLayers, nNeurons)

        modelHist = model.fit(X, y, batch_size=32,              nb_epoch=10,

        validation_split=0.3, shuffle=True, verbose=0)

        minLoss = min(modelHist.history['val_loss'])

        epochNum =                                              modelHist.history['val_loss'].index(minLoss)

        print({0} layers, {1} neurons best validation          at'.format(nLayers, nNeurons))

        print 'epoch {0} loss =                                {1:.2f}'.format(epochNum, minLoss)

Output:

2 layers, 5 neurons best validation at epoch 0 loss = 1.18

2 layers, 6 neurons best validation at epoch 0 loss = 1.21

2 layers, 7 neurons best validation at epoch 8 loss = 1.49

3 layers, 5 neurons best validation at epoch 9 loss = 1.83

3 layers, 6 neurons best validation at epoch 9 loss = 1.91

3 layers, 7 neurons best validation at epoch 9 loss = 1.65

Hope this answer helps you! For more insights on this, study the Machine Learning Online Course. Also, go through the Deep Learning Tutorial for mored details on this.

...