I have written a Neural Network Program. It works for Logic Gates, but when I try to use it for recognizing handwritten digits - it simply does not learn.
Please find the code below:
typedef struct SingleNeuron
{
double outputValue;
std::vector<double> weight;
std::vector<double> deltaWeight;
double gradient;
double sum;
}SingleNeuron;
Then I initialize the net. I set weights to be a random value between -0.5 to +0.5, sum to 0, deltaWeight to 0
Then comes the FeedForward:
for (unsigned i = 0; i < inputValues.size(); ++i)
{
neuralNet[0][i].outputValue = inputValues[i];
neuralNet[0][i].sum = 0.0;
// std::cout << "o/p Val = " << neuralNet[0][i].outputValue << std::endl;
}
for (unsigned i = 1; i < neuralNet.size(); ++i)
{
std::vector<SingleNeuron> prevLayerNeurons = neuralNet[i - 1];
unsigned j = 0;
double thisNeuronOPVal = 0;
// std::cout << std::endl;
for (j = 0; j < neuralNet[i].size() - 1; ++j)
{
double sum = 0;
for (unsigned k = 0; k < prevLayerNeurons.size(); ++k)
{
sum += prevLayerNeurons[k].outputValue * prevLayerNeurons[k].weight[j];
}
neuralNet[i][j].sum = sum;
neuralNet[i][j].outputValue = TransferFunction(sum);
// std::cout << neuralNet[i][j].outputValue << "\t";
}
// std::cout << std::endl;
}
My transfer function and its derivative is mentioned at the end.
After this I try to back-propagate using:
After looking at the results you might feel this guy is simply stuck into local minima, but please wait and read through:
Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
Output = 0.0910903, 0.105674, 0.064575, 0.0864824, 0.128682, 0.0878434, 0.0946296, 0.154405, 0.0678767, 0.0666924
Input = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Output = 0.0916106, 0.105958, 0.0655508, 0.086579, 0.126461, 0.0884082, 0.110953, 0.163343, 0.0689315, 0.0675822
Input = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
Output = 0.105344, 0.105021, 0.0659517, 0.0858077, 0.123104, 0.0884107, 0.116917, 0.161911, 0.0693426, 0.0675156
Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
Output = , 0.107113, 0.101838, 0.0641632, 0.0967766, 0.117149, 0.085271, 0.11469, 0.153649, 0.0672772, 0.0652416
Above is the output of epoch #996, #997,#998 and #999
So simply the network is not learning. For this e.g. I have used ALPHA = 0.4, ETA = 0.7, 10 hidden layers each of 100 neurons and average is over 10 epochs. If you are worried about Learning Rate being 0.4 or so many hidden layers I have already tried their variations. For e.g. for learning rate being 0.1 and 4 hidden layers - each of 16
Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
Output = 0.0883238, 0.0983253, 0.0613749, 0.0809751, 0.124972, 0.0897194, 0.0911235, 0.179984, 0.0681346, 0.0660039
Input = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Output = 0.0868767, 0.0966924, 0.0612488, 0.0798343, 0.120353, 0.0882381, 0.111925, 0.169309, 0.0676711, 0.0656819
Input = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
Output = 0.105252, 0.0943837, 0.0604416, 0.0781779, 0.116231, 0.0858496, 0.108437, 0.1588, 0.0663156, 0.0645477
Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
Output = 0.102023, 0.0914957, 0.059178, 0.09339, 0.111851, 0.0842454, 0.104834, 0.149892, 0.0651799, 0.063558
I am so damn sure that I have missed something. I am not able to figure it out. I have read Tom Mitchel's algorithm so many times, but I don't know what is wrong. Whatever example I solve by hand - works! (Please don't ask me to solve MNIST data images by hand ;) ) I do not know where to change the code, what to do.. please help out..