1 view

I understand what Gradient Descent does. Basically, it tries to move towards the local optimal solution by slowly moving down the curve. I am trying to understand what is the actual difference between the plan gradient descent and Newton's method?

From Wikipedia, I read this short line "Newton's method uses curvature information to take a more direct route." What does this intuitively mean?

by (33.1k points)

In optimization algorithms, there are local minimum and global minimum. If the algorithm achieves a global minimum, then you can say that it is an optimal method to minimize the loss.

At a local minimum x, the resultant derivative of the target function f vanishes f'(x) = 0

Gradient descent is used to find such a minimum x by using information from the first derivative of ‘f’. It follows the steepest descent from the current point of iteration. This is like dropping a ball down the graph of f until it comes to rest.

Newton's method is used to find a point x that satisfies f'(x) = 0 by approximating f' with a linear function g and then solving for the minima of that function explicitly (this is called Newton's root-finding method). The minima of g are not certainly the root of f'. This means it has higher requirements on the smoothness of f, but it also means that it often converges faster.

Hope this answer helps.