1 view

Is it a good practice to use sigmoid or tanh output layers in Neural networks directly to estimate probabilities?

i.e the probability of given input to occur is the output of sigmoid function in the NN

I wanted to use neural network to learn and predict the probability of a given input to occur.. You may consider the input as State1-Action-State2 tuple. Hence the output of NN is the probability that State2 happens when applying Action on State1..

I Hope that does clear things..

When training NN, I do random Action on State1 and observe resultant State2; then teach NN that input State1-Action-State2 should result in output 1.0

by (33.1k points)

You should choose the correct loss function to minimize. The squared error does not point to the maximum likelihood hypothesis here. The squared error is derived from a model with Gaussian noise.

For example:

P(y|x,h) = k1 * e**-(k2 * (y - h(x))**2)

You estimate the probabilities directly. Your model is:

P(Y=1|x,h) = h(x)

P(Y=0|x,h) = 1 - h(x)

P(Y=1|x,h) is the probability that event Y=1 will happen after seeing x.

The maximum likelihood hypothesis for your model is:

h_max_likelihood = argmax_h product(

h(x)**y * (1-h(x))**(1-y) for x, y in examples)

This leads to the "cross-entropy" loss function. In Machine Learning, the loss function and its derivation. Neural Network Tutorial is one of the best things which