Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in AI and Deep Learning by (50.2k points)

For a given neuron, I'm unclear as to how to take a partial derivative of its error and the partial derivative of its weight.

Working from this web page, it's clear how the propagation works (although I'm dealing with Resilient Propagation). For a Feedforward Neural Network, we have to 1) while moving forward through the neural net, trigger neurons 2) from the output layer neurons, calculate a total error. Then 3) moving backward, propagate that error by each weight in a neuron, then 4) coming forwards again, update the weights in each neuron.

Precisely though, these are the things I don't understand.

A) For each neuron, how do you calculate the partial derivative (definition) of the error over the partial derivative of the weight? My confusion is that, in calculus, a partial derivative is computed in terms of an n variable function. And I even understand the chain rule. But it doesn't gel when I think, precisely, of how to apply it to the results of i) linear combiner and ii) sigmoid activation function.

B) Using the Resilient propagation approach, how would you change the bias in a given neuron? Or is there no bias or threshold in a NN using Resilient Propagation training?

C) How do you propagate a total error if there are two or more output neurons? Does the total-error * neuron weight happen for each output neuron value?

1 Answer

0 votes
by (108k points)

One possible choice for the loss function is the squared error defined as the

 loss(y, t) = sum_k (y_k - t_k) ^ 2,

where k refers to the number of output neurons in the network. In backpropagation, one has to compute the partial derivative of the overall optimization objective with respect to the network parameters which are synaptic weights and neuron biases.

As for the question regarding bias, it is being updated based on the direction of the partial derivative, and not on the magnitude. the size of the weight update is progressed if the direction remains unchanged for consecutive iterations. oscillating directions will reduce the size of the update.

The overall optimization objective is a scalar function of all network parameters, no matter how many output neurons there are. So there should be no trouble regarding how to compute partial derivatives here.

If you wish to know about Neural Network then visit this Neural Network Tutorial.

Browse Categories

...