I would like to calculate NN model certainty/confidence (see What my deep model doesn't know) - when NN tells me an image represents "8", I would like to know how certain it is. Is my model 99% certain it is "8" or is it 51% it is "8", but it could also be "6"? Some digits are quite ambiguous and I would like to know for which images the model is just "flipping a coin".

I have found some theoretical writings about this but I have trouble putting this in code. If I understand correctly, I should evaluate a testing image multiple times while "killing off" different neurons (using dropout) and then...?

Working on MNIST dataset, I am running the following model:

from keras.models import Sequential

from keras.layers import Dense, Activation, Conv2D, Flatten, Dropout

model = Sequential()

model.add(Conv2D(128, kernel_size=(7, 7),

activation='relu',

input_shape=(28, 28, 1,)))

model.add(Dropout(0.20))

model.add(Conv2D(64, (3, 3), activation='relu'))

model.add(Dropout(0.20))

model.add(Flatten())

model.add(Dense(units=64, activation='relu'))

model.add(Dropout(0.25))

model.add(Dense(units=10, activation='softmax'))

model.summary()

model.compile(loss='categorical_crossentropy',

optimizer='sgd',

metrics=['accuracy'])

model.fit(train_data, train_labels, batch_size=100, epochs=30, validation_data=(test_data, test_labels,))

Question: how should I predict with this model so that I get its certainty about predictions too? I would appreciate some practical examples (preferably in Keras, but any will do).

I am looking for an example of how to get certainty using the method outlined by Yurin Gal (or an explanation why some other method yields better results).