In optimization algorithms, there are local minimum and global minimum. If the algorithm achieves a global minimum, then you can say that it is an optimal method to minimize the loss.
At a local minimum x, the resultant derivative of the target function f vanishes f'(x) = 0
Gradient descent is used to find such a minimum x by using information from the first derivative of ‘f’. It follows the steepest descent from the current point of iteration. This is like dropping a ball down the graph of f until it comes to rest.
Newton's method is used to find a point x that satisfies f'(x) = 0 by approximating f' with a linear function g and then solving for the minima of that function explicitly (this is called Newton's root-finding method). The minima of g are not certainly the root of f'. This means it has higher requirements on the smoothness of f, but it also means that it often converges faster.
Hope this answer helps.