**Question: **

There are many activation functions in machine learning. I’m unable to understand the softmax function. Softmax function calculates the sum of exponentials, but how can we use softmax function as an activation function? How to implement softmax function in python?

Where S(y_i) is the softmax function of y_i and e is the exponential and j is the no. of columns in the input vector Y.

I've tried the following:

import numpy as np

def softmax(x):

"""Compute softmax values for each sets of scores in x."""

e_x = np.exp(x - np.max(x))

return e_x / e_x.sum()

scores = [3.0, 1.0, 0.2]

print(softmax(scores))

which returns:

[ 0.8360188 0.11314284 0.05083836]

But the suggested solution was:

def softmax(x):

"""Compute softmax values for each sets of scores in x."""

return np.exp(x) / np.sum(np.exp(x), axis=0)

which produces the **same output as the first implementation**, even though the first implementation explicitly takes the difference of each column and the max and then divides by the sum.

**Can someone show mathematically why? Is one correct and the other one wrong?**

**Is the implementation similar in terms of code and time complexity? Which is more efficient?**