Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
5 views
in Machine Learning by (19k points)

Could someone please explain to me how to update the bias throughout backpropagation?

I've read quite a few books, but can't find bias updating!

I understand that bias is an extra input of 1 with a weight attached to it (for each neuron). There must be a formula.

Thank you,

Most interesting. Thank you, I think two good points are: 1. " The "universal approximation" property of multilayer perceptrons with most commonly-used hidden-layer activation functions does not hold if you omit the bias terms. But Hornik (1993) shows that a sufficient condition for the universal approximation property without biases is that no derivative of the activation function vanishes at the origin, which implies that with the usual sigmoid activation functions, a fixed nonzero bias term can be used instead of a trainable bias." 2. The bias terms can be learned just like other weights." So I will either add in a 'constant weight' or train this weight like all the others using gradient descent.

Am I understanding right?

1 Answer

0 votes
by (33.1k points)

The backpropagation algorithm computes partial derivatives of the loss function by using a mathematical equation.

Equation:

∂E/∂w[i,j] = delta[j] * o[i]

In the above equation, w[i,j] is the weight of the connection between neurons, i and j, j being one layer higher in the network than i, and o[i] is the output of i.

These values can then be used in weight updates.

For example:

# update rule gradient descent

w[i,j] = gamma * o[i] * delta[j]

where gamma is the learning rate.

This is the rule for bias weights update until there's no input from a previous layer. Bias is actually caused by input from a neuron with a fixed activation of 1. 

The update rule for bias weights is

bias[j] -= gamma_bias * 1 * delta[j]

where bias[j] is the weight of the bias on neuron j, the multiplication with 1 can obviously be omitted, and gamma_bias may be set to gamma or to a different value. If I recall correctly, lower values are preferred, though I'm not sure about the theoretical justification of that.

Hope this answer helps.

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...