I'm learning how to create convolutional neural networks using Keras. I'm trying to get a high accuracy for the MNIST dataset.

Apparently categorical_crossentropy is for more than 2 classes and binary_crossentropy is for 2 classes. Since there are 10 digits, I should be using categorical_crossentropy. However, after training and testing dozens of models, binary_crossentropy consistently outperforms categorical_crossentropy significantly.

On Kaggle, I got 99+% accuracy using binary_crossentropy and 10 epochs. Meanwhile, I can't get above 97% using categorical_crossentropy, even using 30 epochs (which isn't much, but I don't have a GPU, so training takes forever).

Here's what my model looks like now:

model = Sequential()

model.add(Convolution2D(100, 5, 5, border_mode='valid', input_shape=(28, 28, 1), init='glorot_uniform', activation='relu'))

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(100, 3, 3, init='glorot_uniform', activation='relu'))

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Dropout(0.3))

model.add(Flatten())

model.add(Dense(100, init='glorot_uniform', activation='relu'))

model.add(Dropout(0.3))

model.add(Dense(100, init='glorot_uniform', activation='relu'))

model.add(Dropout(0.3))

model.add(Dense(10, init='glorot_uniform', activation='softmax'))

model.compile(loss='binary_crossentropy', optimizer='adamax', metrics=['accuracy'])