Explore Courses Blog Tutorials Interview Questions
0 votes
in AI and Deep Learning by (50.2k points)

I have implemented a neural network (using CUDA) with 2 layers. (2 Neurons per layer). I'm trying to make it learn 2 simple quadratic polynomial functions using backpropagation.

But instead of converging, it is diverging (the output is becoming infinity)

Here are some more details about what I've tried:

I had set the initial weights to 0, but since it was diverging I have randomized the initial weights

I read that a neural network might diverge if the learning rate is too high so I reduced the learning rate to 0.000001

The two functions I am trying to get it to add are: 3 *i + 7 * j+9 and j*j+ i*i + 24 (I am giving the layer i and j as input)

I had implemented it as a single layer previously and that could approximate the polynomial functions better

I am thinking of implementing momentum in this network but I'm not sure it would help it learn

I am using a linear (as in no) activation function

There is oscillation in the beginning but the output starts diverging the moment any of weights become greater than 1

I have checked and rechecked my code but there doesn't seem to be any kind of issue with it.

So here's my question: what is going wrong here?

Any pointer will be appreciated.

1 Answer

0 votes
by (108k points)

Continuous functions that modify within bounds, whether chaotic or not, impose some limitations. Differences in convergence and divergence discover the relative strength of the trained network connections. Many weak synapses and even some of the most powerful ones are multifunctional in that they have almost equal effects in all learned tasks, as has been observed biologically. Training sets with all kinds of synapses to optimal levels and many units are automatically given task-specific assignments. But many synapses generate relatively weak effects, particularly in networks that combine convergence and divergence within the same layer.

The most obvious reason for a neural network code to diverge is that the coder has forgotten to put the negative sign in the change in weight expression.

Another cause of your problem could be that there is a problem with the error expression used for calculating the gradients.

For more reference regarding the same, refer to the following link:

Browse Categories