Why do we take the derivative of the transfer function in calculating back propagation algorithm?

Question

1 Answer

vinita · Answer 1 · 2019-07-26T12:03:55+0000

The reason is that we are trying to minimize the loss. Precisely, we perform this by a gradient descent method. It fundamentally means that from our current point in the parameter space (determined by the complete set of current weights), we want to go in a direction that will decrease the loss function. Imagine standing on a hillside and walking down the direction where the slope is steepest.

Mathematically, the direction that provides you the steepest descent from your current point in parameter space is the negative gradient. And the gradient is nothing but the vector made up of all the derivatives of the loss function concerning every single parameter.

Why do we take the derivative of the transfer function in calculating back propagation algorithm?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources