2 views

What is the concept behind taking the derivative? It's interesting that for somehow teaching a system, we have to adjust its weights. But why are we doing this using a derivation of the transfer function? What is in derivation that helps us. I know derivation is the slope of a continuous function at a given point, but what does it have to do with the problem.

by (108k points)

The reason is that we are trying to minimize the loss. Precisely, we perform this by a gradient descent method. It fundamentally means that from our current point in the parameter space (determined by the complete set of current weights), we want to go in a direction that will decrease the loss function. Imagine standing on a hillside and walking down the direction where the slope is steepest.

Mathematically, the direction that provides you the steepest descent from your current point in parameter space is the negative gradient. And the gradient is nothing but the vector made up of all the derivatives of the loss function concerning every single parameter.

+1 vote