First of all, I realize from a methodological standpoint why your loss function must be dependent on the output of a neural network. This question comes more from an experiment I've been doing while trying to understand Keras and Tensorflow a bit better. Consider the following:

input_1 = Input((5,))

hidden_a = Dense(2)(input_1)

output = Dense(1)(hidden_a)

m3 = Model(input_1, output)

def myLoss (y_true, y_pred):

return K.sum(hidden_a) # (A)

#return K.sum(hidden_a) + 0*K.sum(y_pred) # (B)

m3.compile(optimizer='adam', loss=myLoss)

x = np.random.random(size=(10,5))

y = np.random.random(size=(10,1))

m3.fit(x,y, epochs=25)

This code induces:

ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

but it runs if you swap line A for line B even though nothing has changed numerically.

The former case seems like it should be perfectly fine to me. The computation graph is well defined and everything should be differentiable in terms of the loss. But it seems that Keras requires y_pred to be in the loss function somehow regardless of whether or not it has any effect.

Thanks!