2 views

I was running TensorFlow and I happen to have something yielding a NaN. I'd like to know what it is but I do not know how to do this. The main issue is that in a "normal" procedural program I would just write a print statement just before the operation is executed. The issue with TensorFlow is that I cannot do that because I first declare (or define) the graph, so adding print statements to the graph definition does not help. Are there any rules, advice, heuristics, anything to track down what might be causing the NaN?

In this case, I know more precisely what line to look at because I have the following:

Delta_tilde = 2.0*tf.matmul(x,W) - tf.add(WW, XX)

#note this quantity should always be positive

#because of its pair-wise euclidian distance

Z = tf.sqrt(Delta_tilde)

Z = Transform(Z) # potentially some transform, currently I have it to return Z for debugging (the identity)

Z = tf.pow(Z, 2.0)

A = tf.exp(Z)

when this line is present I have it that it returns NaN as declared by my summary writers. Why is this? Is there a way to at least explore what value Z has after its being square rooted?

For the specific example I posted, I tried tf.Print(0, Z) but with no success it printed nothing. As in:

Delta_tilde = 2.0*tf.matmul(x,W) - tf.add(WW, XX) #note this quantity should always be positive because its pair-wise euclidian distance

Z = tf.sqrt(Delta_tilde)

tf.Print(0,[Z]) # <-------- TF PRINT STATMENT

Z = Transform(Z) # potentially some transform, currently I have it to return Z for debugging (the identity)

Z = tf.pow(Z, 2.0)

A = tf.exp(Z)

I actually don't understand what tf.Print is supposed to do. Why does it need two arguments? If I want to print 1 tensor why would I need to pass 2? It seems bizarre to me.

by (33.1k points)

You might get Nan values, because of the following reasons:

1. Large learning rate

2. Corrupt data in your input-queue

3. Log of 0 calculation

If you use tf.print as an op in building the graph, when the graph gets executed, then you will get the actual values printed.

You are not using the print-statement in the correct manner. This is op, so you need to pass it as a tensor and call a result-tensor that you need to work with later on in the executing graph. Otherwise, the op can't be executed and no printing occurs. Try this:

Z = tf.sqrt(Delta_tilde)

Z = tf.Print(Z,[Z], message="my Z-values:")

Z = Transform(Z) # potentially some transform, currently I have it to return Z for debugging (the identity)

Z = tf.pow(Z, 2.0)

Visit here to know more about Tensor Flow.