+1 vote
2 views

I have seen a few different mean squared error loss functions in various posts for regression models in Tensorflow:

loss = tf.reduce_sum(tf.pow(prediction - Y,2))/(n_instances)

loss = tf.reduce_mean(tf.squared_difference(prediction, Y))

loss = tf.nn.l2_loss(prediction - Y)

What are the differences between these?

+1 vote
by (33.1k points)
edited by

You first and the second statement is almost similar, but the third equation will give different output.

The third equation is just returning 1/2 of the squared Euclidean norm, that is, the sum of the element-wise square of the input, which is x=prediction-Y. You are not dividing the number of samples anywhere. If you have a very large number of samples, the computations may overflow.

If you are computing the mean of the element-wise squared x tensor. While the documentation does not specify that explicitly, it is very likely that reduce_mean uses an algorithm adapted to avoid overflowing with a very large number of samples. In other words, it is likely does not try to sum everything first and then divide by N, but use some kind of rolling mean that can adapt to an arbitrary number of samples without necessarily causing an overflow.

Learn TensorFlow with the help of this comprehensive video tutorial:

If you wish to learn more about Machine learning, visit Machine Learning tutorial and machine learning certification by Intellipaat.

+1 vote
by (6.8k points)

The first and therefore the second loss functions calculate a similar issue, however during a slightly completely different manner. The third function calculates something completely different. You can see this by executing this code:

import tensorflow as tf

shape_obj = (5, 5)

shape_obj = (100, 6, 12)

Y1 = tf.random_normal(shape=shape_obj)

Y2 = tf.random_normal(shape=shape_obj)

loss1 = tf.reduce_sum(tf.pow(Y1 - Y2, 2)) / (reduce(lambda x, y: x*y, shape_obj))

loss2 = tf.reduce_mean(tf.squared_difference(Y1, Y2))

loss3 = tf.nn.l2_loss(Y1 - Y2)

with tf.Session() as sess:

print sess.run([loss1, loss2, loss3])

# after I run it, I got: [2.0291963, 2.0291963, 7305.1069]

Now you'll be able to verify that 1-st and 2-nd calculates a similar issue (in theory) by noticing that tf.pow(a - b, 2) is the same as tf.squared_difference(a - b, 2). Also, reduce_mean is that the same as reduce_sum / number_of_element. The issue is that computers cannot calculate everything precisely. To see what numerical instabilities can do to your calculations take a look at this:

import tensorflow as tf

shape_obj = (5000, 5000, 10)

Y1 = tf.zeros(shape=shape_obj)

Y2 = tf.ones(shape=shape_obj)

loss1 = tf.reduce_sum(tf.pow(Y1 - Y2, 2)) / (reduce(lambda x, y: x*y, shape_obj))

loss2 = tf.reduce_mean(tf.squared_difference(Y1, Y2))

with tf.Session() as sess:

print sess.run([loss1, loss2])

It is simple to envision that the solution should be 1, however, you'll get one thing like this: [1.0, 0.26843545].

Regarding your last function, the documentation says that:

Computes half the L2 norm of a tensor without the sqrt: output = sum(t ** 2) / 2

So if you want it to calculate the same thing (in theory) as the first one you need to scale it appropriately:

loss3 = tf.nn.l2_loss(Y1 - Y2) * 2 / (reduce(lambda x, y: x*y, shape_obj))