Multi Layer Perceptron
A simple neural network has an input layer, a hidden layer and an output layer. In deep learning, there are multiple hidden layer. The reliability and importance of multiple hidden layers is for precision and exactly identifying the layers in the image. The computations are easily performed in GPU rather than CPU. Due to multiple layers, the gradient descent is very low, that it becomes a vanishing gradient. As the information is passed back, the gradients begins to vanish and becomes very small relative to the weights of the networks. The activation functions are –
- Sigmoid (0 to 1)
- Tanh (-1 to 1)
ReLU overcomes the vanishing gradient problem in the multi layer neural network.
Interested in learning Artificial Intelligence?
Your Path to AI Engineering Mastery
Accelerate Your AI Success with Our Proven Certification
Math behind Neural networks
The input layer is usually a vector, the neural network learns the pattern by learning the weights. The architecture, activation functions, layers in it, dropouts, weights of each epoch is saved in pickle file. There are also the biases stored. Learn in depth about Math behind Neural networks in this Artificial Intelligence Course.
In the first phase, we do a Forward pass. We need to activate layer 1 using activation function(sigmoid or logistic function), we carry out the same process for and so on. The weights are initialized randomly in the beginning, they get updated based on the learning happening further. The weights are based on the number of connections between the input layer and hidden layer. We then get the output of using
Then we repeat this process for the output layer neurons, using the output from the hidden layer neurons as inputs. The output for
These are the actual outcome values. Same process for .The cost function or total error is the sum of squared error function (target – output). Find the which is the sum total of and .
Backwards Pass
The goal of back propagation is to update the weights, so that the actual output is closer to target outputs to minimize errors. Neural networks are supervised learning, the derivations are applied as chain rule. The error in the previous state compared to new state should be lesser, it is based on the learning rate of the neural network. Learning rate is represented as alpha (𝛼) or epsilon (ϵ). Weights are updated in each layer using partial derivatives. Next the backward pass is continued by calculating new values for , , and . Likewise the ‘n+1’ layer learns from the previous layer ‘n’.
In forward pass, the weights are assigned randomly, whereas in backward pass the weights are assigned based on the learning rate. Gradient descent is descenting or gradually decreasing the error factor to get the optimal or very less error. Learning rate, momentum all of these contribute for gradient descent.
Overfitting
When simple problems uses multiple networks, Overfitting issue arises. The data is trained too closely that when the performance to specific problem is close to 90%, but poor on other real examples, the training set is very specific rather being generic. Use lesser hidden layers to avoid overfitting for simpler problems.
Dropouts
To avoid overfitting we use Dropouts. The motivation here is that it uses distinct genes to produce offsprings rather than strengthening co-adapting them.
Get 100% Hike!
Master Most in Demand Skills Now!
Hyperparameters
Architecture for creating neural network consists of – No. layers, perceptrons, activation functions, dropout, optimizer, loss function, no of epoch, learning rate, momentum, metrics(accuracy) and batch size.
We hope this tutorial helps you gain knowledge of Artificial Intelligence course with placements. If you are looking to learn Artificial Intelligence Online Course in a systematic manner with expert guidance and support then you can enroll to our AI Course.