Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)

I can't understand why dropout works like this in tensorflow. The blog of CS231n says that "dropout is implemented by only keeping a neuron active with some probability p (a hyperparameter), or setting it to zero otherwise." Also, you can see this from the picture(Taken from the same site) enter image description here


From the tensorflow site, With probability keep_prob, outputs the input element scaled up by 1 / keep_prob, otherwise outputs 0.


Now, why the input element is scaled up by 1/keep_prob? Why not keep the input element as it is with probability and not scale it with 1/keep_prob?

1 Answer

0 votes
by (33.1k points)

There are many advantages of data preprocessing like scaling or normalization before the training model. Scaling enables the same model to be used for training and evaluation.

If you use a single neural net at test time without dropout, then the weight of that network will be scaled-down versions of the trained weights. If a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p at test time. 

You can use TensorFlow implementation to add op to scale up the weights by (1/keep_prob) at the training time, rather than adding ops to scale down the weights by keep_prob 1 to at the test time.

Afterward, the effect on performance is negligible, and the code will be simpler because we use that same graph to treat keep_prob as a tf.placeholder() to feed a different value depending on whether we are training or evaluating the network.

Browse Categories