I have implemented a neural network (using CUDA) with 2 layers. (2 Neurons per layer). I'm trying to make it learn 2 simple quadratic polynomial functions using backpropagation.
But instead of converging, it is diverging (the output is becoming infinity)
Here are some more details about what I've tried:
I had set the initial weights to 0, but since it was diverging I have randomized the initial weights
I read that a neural network might diverge if the learning rate is too high so I reduced the learning rate to 0.000001
The two functions I am trying to get it to add are: 3 *i + 7 * j+9 and j*j+ i*i + 24 (I am giving the layer i and j as input)
I had implemented it as a single layer previously and that could approximate the polynomial functions better
I am thinking of implementing momentum in this network but I'm not sure it would help it learn
I am using a linear (as in no) activation function
There is oscillation in the beginning but the output starts diverging the moment any of weights become greater than 1
I have checked and rechecked my code but there doesn't seem to be any kind of issue with it.
So here's my question: what is going wrong here?
Any pointer will be appreciated.