Explore Courses Blog Tutorials Interview Questions
0 votes
in Data Science by (17.6k points)

From various examples I've found online I still don't quite understand how to create embedding layers from my categorical data for neural network models, especially when I have a mix of numerical and categorical data. For example, taking the data set as below.:

numerical_df = pd.DataFrame(np.random.randint(0,100,size=(100, 3)), columns=['num_1','num_2','num_3'])

cat_df = pd.DataFrame(np.random.randint(0,5,size=(100, 3)), columns=['cat_1','cat_2','cat_3'])

df = numerical_df.join(cat_df)

I want to create embedding layers for my categorical data and use that in conjunction with my numerical data but from all the examples I've seen its almost like the model just filters the entire dataset through the embedding layer, which is confusing.

As an example of my confusion, below is an example from Keras' documentation on sequential models. It's as though they just add the embedding step as the first layer and fit it to the entirety of x_train.

from keras.models import Sequential

from keras.layers import Dense, Dropout

from keras.layers import Embedding

from keras.layers import LSTM

max_features = 1024

model = Sequential()

model.add(Embedding(max_features, output_dim=256))



model.add(Dense(1, activation='sigmoid'))



              metrics=['accuracy']), y_train, batch_size=16, epochs=10)

score = model.evaluate(x_test, y_test, batch_size=16)

So ultimately when it comes to creating embedding matrices, is there one per categorical for all categorical variables? And how do I reconcile this with my other data that doesn't need an embedding matrix?

1 Answer

0 votes
by (41.4k points)

The model which you are developing should use multiple inputs using functional API for combining the categorical data with the numerical data, one for each numerical data and one for the categorical data.

So,it completely depends upon the conditions you want for combining the data all together and for that you should continue with the rest of the model by concatenating everything together.

numerical_in = Input(shape=(3,))

 cat_in = Input(shape=(3,)) 

embedded_layer = Embedding(input_dim=5, output_dim=3, input_length=3)(cat_in) embedded_layer = Flatten(embedded_layer) 

merged_layer = concatenate([numerical_in, embedded_layer]) 

output = rest_of_your_model(merged_layer) 

model = Model(inputs=[numerical_in, cat_in], outputs=[output])

 ...[numerical_df, cat_df], y=[expected_out])

Gain practical exposure with data science projects in Intellipaat's Data Science course online.

Browse Categories