I am working on a sentence classification problem and try to solve using Keras. The total unique words in the vocabulary are 36.
In this case, the total vocab is [W1, W2, W3....W36]
So, if I have a sentence with words as [W1 W2 W6 W7 W9], if I encode it, I get a numpy array which is like below
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]
[0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
and the shape is (5,36)
I am stuck from here. All, I have generated is 20000 numpy arrays with varying shapes i.e. (N,36) Where N is the number of words in a sentence. So, I have 20,000 sentences for training and 100 for test and all the sentences are labeled with (1,36) one-hot encoding
I have x_train, x_test, y_train, and y_test
x_test and y_test are of dimension (1,36)
Can anyone please advise how do I do it?
I did some of the below coding
model = Sequential()
model.add(Dense(512, input_shape=(??????))),
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
Any help would be much appreciated.