Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (19k points)

Most examples of neural networks for classification tasks I've seen use the a softmax layer as output activation function. Normally, the other hidden units use a sigmoid, tanh, or ReLu function as activation function. Using the softmax function here would - as far as I know - work out mathematically too.

  • What are the theoretical justifications for not using the softmax function as hidden layer activation functions?
  • Are there any publications about this, something to quote?

1 Answer

0 votes
by (33.1k points)
edited by

The following steps explain why using the softmax function on the hidden layer is not a good idea:

1. Variables independence: A lot of regularization and effort is required to keep your variables independent, uncorrelated and quite sparse. If you use the softmax layer as a hidden layer, then you will keep all your nodes linearly dependent which may result in many problems and poor generalization.

2. Training issues:  if your network is working better, you have to make a part of activations from your hidden layer a little bit lower. Here automatically you are making the rest of them have mean activation on a higher level which might, in fact, increase the error and harm your training phase.

3. Mathematical issues: If you create constraints on activations of your model you decrease the expressive power of your model without any logical explanation. 

4. Batch normalization does it better: You may consider the fact that mean output from a network may be useful for training. But on the other hand, a technique called Batch Normalization has been already proven to work better, but it was reported that setting softmax as the activation function in a hidden layer may decrease the accuracy and speed of learning.

A basic context will be provided through studying Machine Learning Online Course In Intellipaat.

Hope this answer helps you!

Watch this video to learn about Neural Networks Tutorials:

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...