Back

Explore Courses Blog Tutorials Interview Questions
0 votes
5 views
in Machine Learning by (19k points)

Could someone please explain to me how to update the bias throughout backpropagation?

I've read quite a few books, but can't find bias updating!

I understand that bias is an extra input of 1 with a weight attached to it (for each neuron). There must be a formula.

Thank you,

Most interesting. Thank you, I think two good points are: 1. " The "universal approximation" property of multilayer perceptrons with most commonly-used hidden-layer activation functions does not hold if you omit the bias terms. But Hornik (1993) shows that a sufficient condition for the universal approximation property without biases is that no derivative of the activation function vanishes at the origin, which implies that with the usual sigmoid activation functions, a fixed nonzero bias term can be used instead of a trainable bias." 2. The bias terms can be learned just like other weights." So I will either add in a 'constant weight' or train this weight like all the others using gradient descent.

Am I understanding right?

1 Answer

0 votes
by (33.1k points)

The backpropagation algorithm computes partial derivatives of the loss function by using a mathematical equation.

Equation:

∂E/∂w[i,j] = delta[j] * o[i]

In the above equation, w[i,j] is the weight of the connection between neurons, i and j, j being one layer higher in the network than i, and o[i] is the output of i.

These values can then be used in weight updates.

For example:

# update rule gradient descent

w[i,j] = gamma * o[i] * delta[j]

where gamma is the learning rate.

This is the rule for bias weights update until there's no input from a previous layer. Bias is actually caused by input from a neuron with a fixed activation of 1. 

The update rule for bias weights is

bias[j] -= gamma_bias * 1 * delta[j]

where bias[j] is the weight of the bias on neuron j, the multiplication with 1 can obviously be omitted, and gamma_bias may be set to gamma or to a different value. If I recall correctly, lower values are preferred, though I'm not sure about the theoretical justification of that.

Hope this answer helps.

Browse Categories

...