The accuracy calculated from the Keras method evaluation is wrong when using binary_crossentropy when you are using more than 2 labels. You can verify that by recomputing the accuracy yourself. For that, you have to first call the Keras function named "predict" and then calculate the number of correct answers returned by predict. You will get the true accuracy, which is much lower than the Keras "evaluate" one.
For more information regarding the same, do refer to the Machine Learning course.