0 votes
1 view
in AI and Deep Learning by (50.5k points)

I am trying to implement backpropagation with recursion for academic purposes, but it seems I have gone wrong somewhere. I have been tinkering with it for a while now but either get no learning at all or no learning on the second pattern.

Please let me know where I've gone wrong. (This is javascript syntax) Note: errors are reset to null before every learning cycle.

this.backpropagate = function(oAnn, aTargetOutput, nLearningRate) {

    nLearningRate = nLearningRate || 1;

    var oNode, 

        n = 0;

    for (sNodeId in oAnn.getOutputGroup().getNodes()) {

        oNode = oAnn.getOutputGroup().getNodes()[sNodeId];

        oNode.setError(aTargetOutput[n] - oNode.getOutputValue());

        n ++;


    for (sNodeId in oAnn.getInputGroup().getNodes()) {

        this.backpropagateNode(oAnn.getInputGroup().getNodes()[sNodeId], nLearningRate);



this.backpropagateNode = function(oNode, nLearningRate) {

    var nError = oNode.getError(),





        nDerivative = oNode.getOutputValue() * (1 - oNode.getOutputValue()), // Derivative for sigmoid activation funciton

        nInputValue = oNode.getInputValue(),


    if (nError === null /* Dont do the same node twice */ && oNode.hasOutputs()) {

        nDerivative = nDerivative || 0.000000000000001;

        nInputValue = nInputValue || 0.000000000000001;

        oOutputNodes = oNode.getOutputNodes();

        for (n=0; n<oOutputNodes.length; n++) {

            nOutputError = this.backpropagateNode(oOutputNodes[n], nLearningRate);

            oConn   = oAnn.getConnection(oNode, oOutputNodes[n]);

            nWeight = oConn.getWeight();

            oConn.setWeight(nWeight + nLearningRate * nOutputError * nDerivative * nInputValue);

            nError += nOutputError * nWeight;




    return oNode.getError();


1 Answer

0 votes
by (108k points)

The function was computed for a single unit with two weights, a constant threshold, and four input-output patterns in the training set. There is a valley in the error function and if gradient descent is started there the algorithm will not converge to the global minimum.

In many cases, local minima appear because of the targets for the outputs of the computing, units are values other than 0 or 1. If a network for the computation of XOR is trained to produce 0.9 at the inputs (0,1) and (1,0) then the surface of the error function develops some protuberances, where local minima can arise.

Possibly, the lower-dimensional networks are more likely to get stuck in local minima. This is easy to understand knowing that higher-dimensional networks are less likely to achieve any minima, even global.

So, re-initializing weights to random (-0.5 to 0.5) values and conducting multiple training sessions eventually will get you through all of them.

If you wish to learn about Artificial Neural Network then visit this Artificial Neural Network Tutorial.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !