Can anyone explain to me in an easy and less mathematical way what is a Hessian and how does it work in practice when optimizing the learning process for a neural network?
Hope the following paper by Bishop can help your question:
Exact Calculation of the Hessian Matrix for the Multilayer Perceptron
If the link doesn't work, the paper was published in Journal Neural Computation, Volume 4, Pages 494–501.
You can also refer the following link for An Intuitive Introduction to the Hessian for Deep Learning Practitioners:
Visit this Neural Network Tutorial for more insights in Hessians for Deep Learning and Hessians for Neural Networking.