0 votes
1 view
in AI and Deep Learning by (20.3k points)

My neural network is a normal feed-forward and back prop. Has 10 outputs, which should be a vector where one of the output is 1, and the rest 0. So something like [0,0,0,0,1,0,0,0,0]. So an output I would expect is something like this:

[0.21332215,0.13782996,0.13548511,0.09321094,0.16769843,0.20333131, 0.06613014,0.10699013,0.10622562,0.09809167]

and ideally once trained, this:

[ 0.21332215,0.13782996,0.93548511 ,0.09321094 ,**0.9**676984,0.20333131, 0.06613014,0.1069901,0.10622562, 0.09809167]

When I have 30 neurons on the hidden layer, and a learning rate of > 0.1 but < 1, i get these results. However, when i have 100 neurons on hidden and have a learning rate of 0.01, i get results like this:

[  1.75289110e-05,1.16433042e-04 ,2.83848791e-01,4.47291309e-02, 1.63011592e-01,8.12974408e-05 , 1.06284533e-03 , 2.95174797e-02, 7.54112632e-05, 1.33177529e-03]

Why is this? Is this what over-learning looks like?

Then, when I change the learning rate to 0.0001 with 100 neurons on hidden, it gets normal results again.

So my question is: how should the learning rate affect the hidden layer count? Should bigger hidden layers mean lower learning rates?

1 Answer

0 votes
by (44.6k points)

You should not bother about changing the learning rate if you change the number of hidden units, because the difference is negligible.

Optimization algorithms like Adagrad and Adadelta calculate a different learning rate for each weight.

I would say ‘negligible’ because apparently, a more complicated optimization surface might need a smaller learning rate (‘complicated’ means it has sharper gradients and more wavey-ness).

It can be said that there is a slight relationship between the hidden unit count and the learning rate, in general, when you increase the hidden unit count, you obtain a more heavily parameterized model with a higher capacity and such a model is always more prone to overfitting on the same training set. 

...