+2 votes
1 view
in Machine Learning by (4.8k points)

I was wondering what was the difference between Activation Layer and Dense layer in Keras.

Since Activation Layer seems to be a fully connected layer, and Dense has a parameter to pass an activation function, what is the best practice?

Let's imagine a fictionnal network like this : Input -> Dense -> Dropout -> Final Layer Final Layer should be : Dense(activation=softmax) or Activation(softmax) ? What is the cleanest and why ?

Thanks, everyone!

2 Answers

+2 votes
by (180 points)

The best practice is to avoid using the softmax function for hidden layers of the nueral nets. The reason is, the output of the softmax function provides us the probability of the label by providing the value in the range of (0,1) and thereby softmax activation is generally preferred to be used at the last layer of the Neural net.

Moreover, if you will try to use Dense(activation=softmax) then it will internally create a dense layer first and apply softmax on top it and show you the result directly and you wont be able to retrieve the exact outputs of the last layer, instead you will get their probability of occurrence.

Hope this helps.

+1 vote
by (7.9k points)

Using Dense(activation=softmax) is computationally corresponding to first add Dense so add Activation(softmax). However there is one advantage of the second approach - you could retrieve the outputs of the last layer (before activation) out of such a defined model. In the first approach - it's impossible.

...