My neural network is a normal feed-forward and back prop. Has 10 outputs, which should be a vector where one of the output is 1, and the rest 0. So something like [0,0,0,0,1,0,0,0,0]. So an output I would expect is something like this:
[0.21332215,0.13782996,0.13548511,0.09321094,0.16769843,0.20333131, 0.06613014,0.10699013,0.10622562,0.09809167]
and ideally once trained, this:
[ 0.21332215,0.13782996,0.93548511 ,0.09321094 ,**0.9**676984,0.20333131, 0.06613014,0.1069901,0.10622562, 0.09809167]
When I have 30 neurons on the hidden layer, and a learning rate of > 0.1 but < 1, i get these results. However, when i have 100 neurons on hidden and have a learning rate of 0.01, i get results like this:
[ 1.75289110e-05,1.16433042e-04 ,2.83848791e-01,4.47291309e-02, 1.63011592e-01,8.12974408e-05 , 1.06284533e-03 , 2.95174797e-02, 7.54112632e-05, 1.33177529e-03]
Why is this? Is this what over-learning looks like?
Then, when I change the learning rate to 0.0001 with 100 neurons on hidden, it gets normal results again.
So my question is: how should the learning rate affect the hidden layer count? Should bigger hidden layers mean lower learning rates?