**Math behind Neural networks**

The input layer is usually a vector, the neural network learns the pattern by learning the weights. The architecture, activation functions, layers in it, dropouts, weights of each epoch is saved in pickle file. There are also the biases stored. Learn in depth about Math behind Neural networks in this Artificial Intelligence Course**.**

In the first phase, we do a Forward pass. We need to activate layer 1 using activation function(sigmoid or logistic function), we carry out the same process for and so on. The weights are initialized randomly in the beginning, they get updated based on the learning happening further. The weights are based on the number of connections between the input layer and hidden layer. We then get the output of using

Then we repeat this process for the output layer neurons, using the output from the hidden layer neurons as inputs. The output for

These are the actual outcome values. Same process for .The cost function or total error is the sum of squared error function (target – output). Find the which is the sum total of and .

*Learn more about Forward pass in this insightful Artificial Intelligence Course in Sydney now!*

**Backwards Pass**

The goal of back propagation is to update the weights, so that the actual output is closer to target outputs to minimize errors. Neural networks are supervised learning, the derivations are applied as chain rule. The error in the previous state compared to new state should be lesser, it is based on the learning rate of the neural network. Learning rate is represented as alpha (𝛼) or epsilon (ϵ). Weights are updated in each layer using partial derivatives. Next the backward pass is continued by calculating new values for , , and . Likewise the ‘n+1’ layer learns from the previous layer ‘n’.

In forward pass, the weights are assigned randomly, whereas in backward pass the weights are assigned based on the learning rate. Gradient descent is descenting or gradually decreasing the error factor to get the optimal or very less error. Learning rate, momentum all of these contribute for gradient descent.

*Go through the Artificial Intelligence Course in Toronto to get clear understanding of Backwards Pass.*

**Multi Layer Perceptron**

A simple neural network has an input layer, a hidden layer and an output layer. In deep learning, there are multiple hidden layer. The reliability and importance of multiple hidden layers is for precision and exactly identifying the layers in the image. The computations are easily performed in GPU rather than CPU. Due to multiple layers, the gradient descent is very low, that it becomes a vanishing gradient. As the information is passed back, the gradients begins to vanish and becomes very small relative to the weights of the networks. The activation functions are –

- Sigmoid (0 to 1)
- Tanh (-1 to 1)

ReLU overcomes the vanishing gradient problem in the multi layer neural network.

Interested in learning Artificial Intelligence?

*Click here to learn more in this Artificial Intelligence Course in Singapore!*

**Overfitting**

When simple problems uses multiple networks, Overfitting issue arises. The data is trained too closely that when the performance to specific problem is close to 90%, but poor on other real examples, the training set is very specific rather being generic. Use lesser hidden layers to avoid overfitting for simpler problems. These are described in more detail on** AI and Deep Learning community**

**Dropouts**

To avoid overfitting we use Dropouts. The motivation here is that it uses distinct genes to produce offsprings rather than strengthening co-adapting them.

**Hyperparameters**

Architecture for creating neural network consists of – No. layers, perceptrons, activation functions, dropout, optimizer, loss function, no of epoch, learning rate, momentum, metrics(accuracy) and batch size.

Prepare yourself for the * Top Artificial Intelligence Interview Questions And Answers *Now