**The backpropagation algorithm** computes partial derivatives of the loss function by using a mathematical equation.

**Equation:**

**∂E/∂w[i,j] = delta[j] * o[i]**

In the above equation, w[i,j] is the weight of the connection between neurons, i and j, j being one layer higher in the network than i, and o[i] is the output of i.

These values can then be used in weight updates.

**For example:**

# update rule gradient descent

w[i,j] = gamma * o[i] * delta[j]

where gamma is the learning rate.

This is the rule for bias weights update until there's no input from a previous layer. Bias is actually caused by input from a neuron with a fixed activation of 1.

The update rule for bias weights is

bias[j] -= gamma_bias * 1 * delta[j]

where bias[j] is the weight of the bias on neuron j, the multiplication with 1 can obviously be omitted, and gamma_bias may be set to gamma or to a different value. If I recall correctly, lower values are preferred, though I'm not sure about the theoretical justification of that.

Hope this answer helps.