Big multiplication function gradient forces the net probably almost immediately into some horrifying state where all its hidden nodes have zero gradient. We can use two approaches:
1) Divide by constant. We are just dividing everything before the learning and multiply after.
2) Make log-normalization. It makes multiplication into addition:
m = x*y => ln(m) = ln(x) + ln(y).