From various examples I've found online I still don't quite understand how to create embedding layers from my categorical data for neural network models, especially when I have a mix of numerical and categorical data. For example, taking the data set as below.:
numerical_df = pd.DataFrame(np.random.randint(0,100,size=(100, 3)), columns=['num_1','num_2','num_3'])
cat_df = pd.DataFrame(np.random.randint(0,5,size=(100, 3)), columns=['cat_1','cat_2','cat_3'])
df = numerical_df.join(cat_df)
I want to create embedding layers for my categorical data and use that in conjunction with my numerical data but from all the examples I've seen its almost like the model just filters the entire dataset through the embedding layer, which is confusing.
As an example of my confusion, below is an example from Keras' documentation on sequential models. It's as though they just add the embedding step as the first layer and fit it to the entirety of x_train.
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Embedding
from keras.layers import LSTM
max_features = 1024
model = Sequential()
model.add(Embedding(max_features, output_dim=256))
model.add(LSTM(128))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=16, epochs=10)
score = model.evaluate(x_test, y_test, batch_size=16)
So ultimately when it comes to creating embedding matrices, is there one per categorical variable...one for all categorical variables? And how do I reconcile this with my other data that doesn't need an embedding matrix?