The backpropagation algorithm computes partial derivatives of the loss function by using a mathematical equation.
∂E/∂w[i,j] = delta[j] * o[i]
In the above equation, w[i,j] is the weight of the connection between neurons, i and j, j being one layer higher in the network than i, and o[i] is the output of i.
These values can then be used in weight updates.
# update rule gradient descent
w[i,j] = gamma * o[i] * delta[j]
where gamma is the learning rate.
This is the rule for bias weights update until there's no input from a previous layer. Bias is actually caused by input from a neuron with a fixed activation of 1.
The update rule for bias weights is
bias[j] -= gamma_bias * 1 * delta[j]
where bias[j] is the weight of the bias on neuron j, the multiplication with 1 can obviously be omitted, and gamma_bias may be set to gamma or to a different value. If I recall correctly, lower values are preferred, though I'm not sure about the theoretical justification of that.
Hope this answer helps.