Both L1 and L2 normalization are beneficial in various conditions.
Advantages of L2 over L1 norm
- Mathematical derivations of the L2 norm are easily computed. Therefore it is also easy to use gradient-based learning methods.
- L2 regularization optimizes the mean cost (whereas L1 reduces the median explanation) which is often used as a performance measurement. This is especially good if you know you don't have any outliers and you want to keep the overall error small.
- The solution is more likely to be unique. This ties in with the previous point: While the mean is a single value, the median might be located in an interval between two points and is therefore not unique.
- While L1 regularization can give you a sparse coefficient vector, the non-sparseness of L2 can improve your prediction performance (since you leverage more features instead of simply ignoring them).
- L2 is invariant under rotation. If you have a dataset consisting of points in a space and you apply a rotation, you still get the same results.
Advantages of L1 over L2 norm
- The L1 norm prefers sparse coefficient vectors. This means the L1 norm performs feature selection and you can delete all features where the coefficient is 0. A reduction of the dimensions is useful in almost all cases.
- The L1 norm optimizes the median. Therefore the L1 norm is not sensitive to outliers.
Hope this answer helps you!
If you want to know more about various Machine Learning Courses then watch this video: