I am trying to train a sequence tagging model (LSTM), where the sequence labels are either **1 **(first class) ,** 2** (second class) or **0** (don't care).

I tried to write my own loss function that ignores the zeros:

import keras.backend as K

def my_loss(y_true, y_pred):

"""(sum([(t-p)**2 for t,p in zip(y_true, y_pred)])/n_nonzero)**0.5"""

return K.sqrt(K.sum(K.square(y_pred*K.cast(y_true>0, "float32") - y_true), axis=-1) / K.sum(K.cast(y_true>0, "float32") ))

Which essentially calculates mean squared error only on non-zeros.

However, I get **loss=nan** when training the model.

What I am doing wrong ?

What is the standard way to ignore certain labels in the training process ?