Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)

Looking at an example 'solver.prototxt', posted on BVLC/caffe git, there is a training meta parameter

weight_decay: 0.04

What does this meta parameter mean? And what value should I assign to it?

1 Answer

0 votes
by (33.1k points)

The weight_decay meta parameter is used for the regularization of the neural net.

During the training of a neural network, a regularization term is added to the network's loss to compute the backprop gradient. The weight_decay value determines the regularization terms that will be used in the gradient computation.

As a rule of thumb states, the more training examples you have, the weaker this term should be. More parameters you have (i.e., deeper net, larger filters, larger InnerProduct layers, etc.) the higher this term should be.

Caffe allows you to choose between L2 regularization (default) and L1 regularization, 

by setting

regularization_type: "L1"

Weights are small numbers (i.e., -1<w<1), the L2 norm of the weights is significantly smaller than their L1 norm. If you choose to use regularization_type: "L1" you might need to tune weight_decay to a significantly smaller value.

The learning rate may change during training, the regularization weight is fixed throughout.

Hope this answer helps

Browse Categories