**Neuron:**

In human brain, there are millions and billions of neurons, which keeps learning, updating all the time. The dendrites pass the information from other neuron to the cell, that signal passes through axon reaches the terminal bulb or to other neurons. Dendrites are connected to other neurons. This is called neural networks.

**Watch this Introduction to Neural Networks video**

If we apply the same in the neural network for AI, the hidden layer is the neuron or cell body, it is connected to other neurons(input layer) through dendrites and the output is passed to other neurons or terminal bulb or it can be used as an output or as an input to other neuron. Weights like w1,w2, w3 are applied to the inputs, using these weights and inputs, the neuron performs an action and creates a signal and passes to output layer. This is the working of the basic neuron. This is called **feed forward network. **

When multiple neurons are present, it has two or more hidden layers which is decided by us. This is called **multi-layer perceptron.** The more the number of neurons and layers, more the number of weights. Every input is connected to every neuron present in the hidden layer.

Our agenda is to find the weights w1,w2, w3 so that we have achieved the output we want. The neural network techniques work more accurately and better than regression. We have to find the optimal value of weight, so our goal is achieved, i.e. to make the difference between actual and predicted is minimum. The mean squared error or cross function is calculate as –

c=12(y-y)2where y=i=1mwixi

Each time we update the weights, the errors go down. The neural networks are more flexible with many weights to train, so that the error is minimized, redundant weights are set to 0 by the model itself. This way we have flexibility compared to regression models and it can find so many features. We don’t have multiple features available in regression model. When the data is too complex, we create many hidden layers with deep network.

To find out minimum weights of the function, we do derivative of the function. On the cost function, the slope of the weight is made to minimum or zero, this is called** gradient descent.** This descending on gradient is the most important algorithm in neural network. The error is minimized this way. Each weight is performed the same way one by one, this is called **Online or Stochastic gradient descent. **The weights are processed as one batch and the weight is updated, this is **Batch gradient descent. **

When we consumed all the data in online mode or in batch process, it is called **Epoch 1. **When this process is repeated, the epoch keeps increasing to 2, 3 and so on. It only increases when the whole dataset is processed.

**Online vs Batch :**

Online |
Batch |

Faster | Slower than online |

Avoids local minima, more flexibility. | Doesn’t avoid local minima. |

Calculations are less | Calculations are more since it is passed as single batch |

**Mini batch gradient descent : **

We take random set of data and make a random batch. We take 10 data sets randomly and process it in mini batches instead of the whole batch. On completion of each mini batch, Epoch increases by 1.

**Creating a Neural Network for Simple Linear Equation :**

import pandas as pd import numpy as npCase 1 :y = ax1 + bx2 x1 = 3 x2 = 4 a = 3 b = 8 In case 1, the value of y is 41.Case 2 :x1 = 3 x2 = 4 y = 41 In case 2, we don’t know a and b, but we know the value of y. x = np.array(x1,x2) y = 41 x contains the value as array ([3,4]) w1 = 1 w2 = 1 y_hat = w1*x1 + w2* x2 cost = 0.5(y - y_hat)^2 The cost value is 578. w1 = 1 w2 = 1 for epoch in range(10): dcostdw1 = 2*0.5*(y - y_hat) * (-x1) dcostdw2 = 2*0.5*(y - y_hat) * (-x2) w1 = w1 - dcostdw1 w1 = w2 - dcostdw2 print(“epoch:{:d} w1:{:f} w2:{:f} cost:{:f}”.format(epoch,w1,w2,cost)) In this case the weights are larger, so it resulted in increase in slope rather than decreasing. In the same case, now we alter the learning rate, so the costs get decreased. w1 = 1 w2 = 1 lr= 0.01 for epoch in range(10): dcostdw1 = 2*0.5*(y - y_hat) * (-x1) dcostdw2 = 2*0.5*(y - y_hat) * (-x2) w1 = w1 - dcostdw1 w2 = w2 - dcostdw2 print(“epoch:{:d} w1:{:f} w2:{:f} cost:{:f}”.format(epoch,w1,w2,cost))

**Sample output : **

epoch:0 w1: 2.0200 w2:2.3600 cost:578.0000 epoch:1 w1: 2.7850 w2:3.3800 cost:325.1250 . . .

We need to keep changing the weights(w1,w2) and learning rate(lr) to different values and repeat the iterations to obtain the error (cost) as zero.

By repeating the iterations at

epoch :49 w1:5.079998 w2:6.439997 cost:0.000 Input: 5.079998*x1 + 6.439997*x2 Output: 40.999982

Hence we got the value of a and b. This is a simple neural network model with one equation, gradient descent, we calculated the cost and did for multiple times. Similarly we can solve two equations to ‘n’ number of other problems using neural networks.