2 views

I understand what Gradient Descent does. Basically, it tries to move towards the local optimal solution by slowly moving down the curve. I am trying to understand what is the actual difference between the plan gradient descent and Newton's method?

From Wikipedia, I read this short line "Newton's method uses curvature information to take a more direct route." What does this intuitively mean?

by (33.1k points)

In optimization algorithms, there are local minimum and global minimum. If the algorithm achieves a global minimum, then you can say that it is an optimal method to minimize the loss.

At a local minimum x, the resultant derivative of the target function f vanishes f'(x) = 0

Gradient descent is used to find such a minimum x by using information from the first derivative of ‘f’. It follows the steepest descent from the current point of iteration. This is like dropping a ball down the graph of f until it comes to rest.

Newton's method is used to find a point x that satisfies f'(x) = 0 by approximating f' with a linear function g and then solving for the minima of that function explicitly (this is called Newton's root-finding method). The minima of g are not certainly the root of f'. This means it has higher requirements on the smoothness of f, but it also means that it often converges faster.