Why use softmax only in the output layer and not in hidden layers?

Question

1 Answer

Anurag · Answer 1 · 2019-07-29T09:36:17+0000

The following steps explain why using the softmax function on the hidden layer is not a good idea:

1. Variables independence: A lot of regularization and effort is required to keep your variables independent, uncorrelated and quite sparse. If you use the softmax layer as a hidden layer, then you will keep all your nodes linearly dependent which may result in many problems and poor generalization.

2. Training issues: if your network is working better, you have to make a part of activations from your hidden layer a little bit lower. Here automatically you are making the rest of them have mean activation on a higher level which might, in fact, increase the error and harm your training phase.

3. Mathematical issues: If you create constraints on activations of your model you decrease the expressive power of your model without any logical explanation.

4. Batch normalization does it better: You may consider the fact that mean output from a network may be useful for training. But on the other hand, a technique called Batch Normalization has been already proven to work better, but it was reported that setting softmax as the activation function in a hidden layer may decrease the accuracy and speed of learning.

A basic context will be provided through studying Machine Learning Online Course In Intellipaat.

Hope this answer helps you!

Watch this video to learn about Neural Networks Tutorials:

Why use softmax only in the output layer and not in hidden layers?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources