+1 vote
in AI and Deep Learning by (230 points)

According to some document the weight adjustment formula will be:

new weight = old weight + learning rate * delta * df(e)/de * input

df(e)/de part is a derivative of activation function which acts like a sigmond function i.e tanh. Actually what is that and why are we multiplying it? Why not (learning rate * delta * input) is enough?

1 Answer

+4 votes
by (10.9k points)
edited by

@malika, hope this answer will help you in better understanding.

The derivative (df(e)/de) is used by the optimization technique for locating the minima of the loss function. A large value of derivative results in a large adjustment in the corresponding weight. Always remember the larger the derivative, the farther you are from the point of minima and if the value of the derivative is small it means you are near the point of minima.

The first derivative denotes a point on the curve such that any line which is tangent to it will have a slope zero. It also tells you whether you are moving in the right direction to reach the function’s minima or not.

For example-

Suppose you are walking in a 3D surface which is defined by the objective function and you reach a point where the slope is equal to zero, it means that the point is minima for the function.

So, if you want to minimize a function you need to follow the derivative.

Wish to gain an in-depth knowledge of AI? Check out our Artificial Intelligence Tutorial and gather more insights!

Welcome to Intellipaat Community. Get your technical queries answered by top developers !