Introduction to Neural Networks
Have you ever wondered, how your brain recognizes numbers? No matter how the digits or numbers looks like, brain will relate that to the best possible pattern and concludes the result. This is where the thinking came out to make a something which can recognize similar number patterns, and that is where Neural Networks starts.
Let’s discuss a situation
The digits in the above picture had been written in extremely low resolution of 28px by 28px even though how effortlessly your brain identifies that as 6. Similarly, there can be very different shapes and pattern for a same number but somehow your visual cortex resolves those as representing the same idea while recognizing the other pictures as their own distinct idea.
Here you will also learn about the Neural Network basics and its key features.
Artificial Neural Network
Watch this Neural Network Tutorial Tutorial Video
What is an Artificial Neural Network?
- A Neural Network is a system designed to operate like a human brain. Human information processing takes place through the interaction of many billions of neurons connected to each other sending signals to other neurons.
- Similarly, a Neural Network is a network of artificial neurons, as found in human brains, for solving artificial intelligence problems such as image identification. They may be a physical device or mathematical constructs.
- In other words, Artificial Neural Network is a parallel computational system consisting of many simple processing elements connected to perform a particular task.
Biological Motivation
In the above topic, you learned about the Neural Network. Now there is a question which you might be wondering that what motivates towards Neural Network?
Motivation behind neural network is human brain. Human brain is called as the best processor even though it works slower than other computers. Many researchers thought to make a machine that would work in the prospective of the human brain.
Human brain contains billion of neurons which are connected to many other neurons to form a network so that if it sees any image, it recognizes the image and processes the output.
- Dendrite receives signals from other neurons.
- Cell body sums the incoming signals to generate input.
- When the sum reaches a threshold value, neuron fires and the signal travels down the axon to the other neurons.
- The amount of signal transmitted depend upon the strength of the connections.
- Connections can be inhibitory, i.e. decreasing strength or excitatory, i.e. increasing strength in nature.
In the similar manner, it was thought to make artificial interconnected neurons like biological neurons making up an Artificial Neural Network(ANN). Each biological neuron is capable of taking a number of inputs and produce output.
Neurons in human brain are capable of making very complex decisions, so this means they run many parallel processes for a particular task. One motivation for ANN is that to work for a particular task identification through many parallel processes.
Structure of Neural Network
Artificial Neuron
Artificial Neuron are also called as perceptrons. This consist of the following basic terms:
- Input
- Weight
- Bias
- Activation Function
- Output
How perceptron works?
A. All the inputs X1, X2, X3,…., Xn multiplies with their respective weights.
B. All the multiplied values are added.
C. Sum of the values are applied to the activation function.
Weights and Bias
- Weights W1, W2, W3,…., Wn shows the strength of a neuron.
- Bias allows you to change/vary the curve of the activation curve.
Weight without bias curve graph:
W1 = 0.5
W2 = 1.0
W3 = 2.0
X1 = 'w = 0.5'
X2 = 'w = 1.0'
X3 = 'w = 2.0'
for W, X in [(W1, X1), (W2, X2), (W3, X3)]:
f = 1 / (1 + np.exp(-X*W))
plt.plot(X, f, label=l)
plt.xlabel('x')
plt.ylabel('h_w(x)')
plt.legend(loc=2)
plt.show()
Here, by changing weights, you can very input and outputs. Different weights changes the output slope of the activation function. This can be useful to model Input-Output relationships.
What if you only want output to be changed when X>1? Here, the role of bias starts.
Let’s alter the above example with bias as input.
w = 5.0
b1 = -8.0
b2 = 0.0
b3 = 8.0
X1 = 'b = -8.0'
X2 = 'b = 0.0'
X3 = 'b = 8.0'
for b, X in [(b1,Xl1), (b2, X2), (b3, X3)]:
f = 1 / (1 + np.exp(-(X*w+b)))
plt.plot(X, f, label=l)
plt.xlabel('x')
plt.ylabel('h_wb(x)')
plt.legend(loc=2)
plt.show()
As you can see, by varying the bias b, you can change when the node activates. Without a bias, you cannot vary the output.
Input layer, Hidden layer and Output layer
Input Layer
Input layer contains inputs and weights. Example: X1, W1, etc.
Hidden Layer
In a neural network, there can be more than one hidden layer. Hidden layer contains the summation and activation function.
Output Layer
Output layer consists the set of results generated by the previous layer. It also contains the desired value, i.e. values that are already present in the output layer to check with the values generated by the previous layer. It may be also used to improve the end results.
Let’s understand with an example.
Suppose you want to go to a food shop. Based on the three factors you will decide whether to go out or not, i.e.
- Weather is good or not, i.e. X1. Say X1=1 for good weather and X1=0 for bad weather.
- You have vehicle available or not, i.e. X2. Say X2=1 for vehicle available and X2=0 for not having vehicle.
- You have money or not, i.e. X3. Say X3=1 for having money and X3=0 for not having money.
Based on the conditions, you choose weight on each condition like W1=6 for money as money is the first important thing you must have, W2=2 for vehicle and W3=2 for weather and say you have set threshold to 5.
In this way, perceptron makes decision making model by calculating X1W1, X2W2, and X3W3 and comparing these values to the desired output.
Activation Function
Activation functions are used for non-linear complex functional mappings between the inputs and required variable. They introduce non-linear properties to our Network.
They convert an input of an artificial neuron to output. That output signal now is used as input in the next layer.
Simply, input between the required values like (0, 1) or (-1, 1) are mapped with the activation function.
Why Activation Function?
Activation Function helps to solve the complex non-linear model. Without activation function, output signal will just be a linear function and your neural network will not be able to learn complex data such as audio, image, speech, etc.
Some commonly used activation functions are:
- Sigmoid or Logistic
- Tanh — Hyperbolic tangent
- ReLu -Rectified linear units
Sigmoid Activation Function:
Sigmoid Activation Function can be represented as:
f(x) = 1 / 1 + exp(-x)
- Range of sigmoid function is between 0 and 1.
- It has some disadvantages like slow convergence, vanishing gradient problem or it kill gradient, etc. Output of Sigmoid is not zero centered that makes its gradient to go in different directions.
Tanh- Hyperbolic tangent
Tanh can be represented as:
f(x) = 1 — exp(-2x) / 1 + exp(-2x)
It solves the problem occurring with Sigmoid function. Output of Tanh is zero centered because range is between -1 and 1.
Optimization is easy as compared to Sigmoid function.
But still it suffers gradient vanishing problem.
ReLu- Rectified Linear units
It can be represented as:
R(x) = max(0,x)
if x < 0 , R(x) = 0 and if x >= 0 , R(x) = x
It avoids as well as rectifies vanishing gradient problem. It has six times better convergence as compared to tanh function.
It should be used within hidden layers of the neural network.
Gradient Descent
Gradient is the slope of the error curve.
The idea of introducing gradient to reduce the or minimize the error between the desired output and the input. To predict the output based on the every input, weight must be varied to minimize the error.
Problem is how to vary weight seeing the output error. This can be solved by gradient descent.
In the above graph, blue plot shows the error, red dot shows the ‘w’ value to minimize the error and the black cross or line show the gradient.
At point 1, random ‘w’ value is selected with respect to error and gradient is checked.
If gradient is positive with respect to the increase in w, then step towards will increase the error and if it is negative with respect to increase in ‘w’, then step towards will decrease the error. In this way, gradient shows the direction of the error curve.
The process of minimizing error will continue till the output value reaches close to the desired output. This is a type of Backpropagation.
Wnew=Wold–α ∗∇error
Wnew= new ‘w’ position
Wold= current or old ‘w’ position
∇error= gradient of error at Wold.
α= how quickly converges to minimum error
Example:
Minimum value of the equation f(x)=x4–3x3+2
Minimum value of x is 2.25 by mathematical calculation, so program must give the gradient for that.
x_old = 0
x_new = 6
gamma = 0.01 # step size
precision = 0.00001
def df(x):
y = 4 * x**3 - 9 * x**2
return y
Output:
2.249965
Feed-Forward Neural Network
- Feed-forward network means data flows in only one direction, i.e. from input to output.
- In gradient topic, you have studied about minimizing the error. The main agenda is also to minimize the error and for that there are various methods.
- In feed-forward neural network, when the input is given to the network before going to the next process, it guesses the output by judging the input value. After guess, it checks the guessing value to the desired output value. The difference between the guessing value and the desired output is error.
Guess= input * weight
Error= Desired Output – Guess
You already know how to minimize the error.
Single Layer Perceptron and Problem with Single Layer Perceptron
- Single Layer Perceptron is a linear classifier and if the cases are not linearly separable the learning process will never reach a point where all cases are classified properly.
- It is a type of form feed neural network and works like a regular Neural Network.
Example:
In the above picture, you can see that it is impossible to draw a straight line in case of XOR. So, linear classifier fails in case of Single Layer Perceptron.
Multi-Layer Perceptron(MLP)
- It is a type of feed-forward network.
- This propagation uses backpropagation.
However, multilayer perceptron uses the backpropagation algorithm that can successfully classify the XOR data.
- A multilayer perceptron (MLP) has the same structure as that of the single layer perceptron with one or more hidden layers.
- It uses a non-linear activation function and utilizes backpropagation for training. For example, speech recognition and machine translation.
This is how backpropagation works. It uses gradient descent algorithm.
Let’s see a program that explains the multi-layer propagation
//assume all the necessary package classes are imported
public class XorMLP{
public static void main(String[] args) {
// XOR function
DataSettrainingSet = new DataSet(2, 1);
trainingSet.addRow(new DataSetRow(new double[]{0, 0}, new double[]{0}));
trainingSet.addRow(new DataSetRow(new double[]{0, 1}, new double[]{1}));
trainingSet.addRow(new DataSetRow(new double[]{1, 0}, new double[]{1}));
trainingSet.addRow(new DataSetRow(new double[]{1, 1}, new double[]{0}));
// MLP
MultiLayerPerceptronML = newMultiLayerPerceptron(TransferFunctionType.TANH, 2, 3, 1);
// learn the training set
ML.learn(trainingSet);
// test perceptron
System.out.println("Testing trained neural network");
testNeuralNetwork(ML, trainingSet);
// save NN
ML.save("ML.nnet");
// load NN
NeuralNetworkloadML = NeuralNetwork.createFromFile("ML.nnet");
// test NN
System.out.println("Testing loaded neural network");
testNeuralNetwork(loadML, trainingSet);
}
public static void testNeuralNetwork(NeuralNetworknn, DataSettestSet){
for(DataSetRowdataRow : testSet.getRows()) {
nn.setInput(dataRow.getInput());
nn.calculate();
double[ ] Output = nn.getOutput();
System.out.print("Input: " + Arrays.toString(dataRow.getInput()) );
System.out.println(" Output: " + Arrays.toString(Output) );
}
}
}
Run the above code and you will get the main output as 0.862 which is close to the desired output 1. Small error is acceptable.
Types of Neural Network
Mainly used Neural networks are:
Convolutional Neural Network(CNN)/ ConvNets
Images having high pixels cannot be checked under MLP or regular neural network. In CIFAR-10, images are of the size 32*32*3., i.e. 3072 weights. But for image with size 200*200*3, i.e. 120,000 weights, number of neurons required will be more. So, fully connectivity is not so useful in this situation.
In CNN, the input consists of images and the layers having neurons in three-dimensional structure, i.e. width, height, depth.
Example:
In CIFAR-10, the input volume has dimensions, i.e. width, height, depth 32*32*3.
In convolutional layer, neurons receive input from only a restricted subarea of the previous layer, i.e. neurons will be only connected to a small region to the layer before it and not in the fully connecting manners.
So, the output layer for CIFAR-10 would have dimensions 1*1*10 because by the end of the CNN the full image will get converted into a small vector along its depth.
A simple CNN for CIFAR-10 has the following sections:
- Input(32*32*3), i.e. width 32, height 32 and 3 colour channels red, blue and green.
- Convolution layer connects the output of the neurons. This will result in the volume 32*32*8 if you decided to use 8 filters.
- Activation Function( ReLu) will apply an activation function that leaves the volume as it is.
- Pooling layer will perform down-sampling that reduces the volume to 16*16*8.
- Fully connected layer computes the class score resulting in 1*1*10 volume. Each neuron will be connected to the previous layer numbers just like the regular neural network.
In this way, CNN transforms the image layer by layer to the class score.
It is not important that all the layers contain the same parameters.
CNN Overview:
- Input volume(W1, H1, D1)
- Parameters K for number of filters, receptive field size F, stride S and zero padding P.
- S is used to mention numbers so that how to slide filters can be known. Example, S=2 means slide the filter by 2. This helps to reduce the volume.
- P is used to pad the input borders by 0. It helps to control the size of the output volume.
- Output produced having volume W2, H2, D2 where
W2= (W1-F+2P)/S+1
H2=(H1-F+2P)/S+1
D2=K
- Parameter sharing, i.e. used to control the parameters. This helps in reusing the parameters. It introduces F.F.D1 weights per filter for total of (F.F.D1).K weights and K bias.
Watch this Convolutional Neural Network Tutorial (CNN) Video
Recursive Neural Network(RNN)
You have learned how to represent a single word. But how could you represent phrases or sentences?
Also, can you model relation between words and multi-word expressions?
Example: “consider” = “take into account” or can you extract representations of full sentences that preserve some of its semantic meaning?
Example: “words representation learned from Intellipaat” = “Intellipaat trained you on text data sets representations”
To solve this problem recursive neural network was introduced.
It uses binary tree and is trained to identify related phrases or sentences.
Example: A wise person suddenly enters the Intellipaat
The idea of recursive neural network is to recursively merge pairs of a representation of smaller segments to get representations uncover bigger segments.
The tree structure works on the two following rules:
- The semantic representation if the two nodes are merged.
- Score of how plausible the new node would be, i.e. how matching the two merged words are.
Let’s say a parent has two children
In place of and , there can be two words from a sentence as seen in the above picture. By checking the scores of each pair it produces the output.
What is RNN?
- Recursive neural networks architecture can operate on structured input without time limiting to the input sequences. RNN uses parse-tree structural representations.
- RNN are also called deep neural network. RNN have been successful in natural language processing(mainly phrases and sentences).
- This helps to solve connected handwriting related problems or speech recognition.
Recurrent neural network (RNN)
- Recurrent NN is a simplified version of Recursive NN where the time factor is the main factor between the input elements.
In the above picture, at each time step, in addition to the user’s input, it also accepts the output of the previous hidden layer. Recurrent NN operates on linear chain.
So, you can say that Recursive NN operates on hierarchical structure whereas Recurrent NN operates on the chain structures.
Let’s see what is Recurrent NN:
- In Recurrent NN, the information cycles through a loop, i.e. it takes the current input and also what it has learned from the previous layer outputs as shown in the above picture for decision making.
Let’s say you have a normal neural network to which input is Intellipaat. It processes the word character by character. By the time it reaches to the character ‘e’, it has already forgotten about ‘I’, ‘n’, and ‘t’. So, it is impossible for normal neural network to predict next letter.
Recurrent NN remembers that because it has its own internal memory. It produces output, copies that output and loops it back into the network.
Let’s take a practical example to understand Recurrent NN:
Let’s say you are the perfect person for Intellipaat, but why? Because every week, you attain regular courses from Intellipaat to develop your skills. Your schedules are:
Monday = Java, Tuesday = Data Science, Wednesday = Hadoop, Thursday= AWS
If you want a NN to tell you about your next day course then, you have to enter today’s course. So, if you enter input Java then, it should output Data Science. When input is Data Science, then output should be Hadoop, and so on.
From the above example, output is recursively is going to the input to judge the next output.
Long short-term memory (LSTM)
What will happen if there are 100 or more than 100 cases in Recurrent NN? What if there are 150 days in a week and you want NN to tell you next day’s course information?
In the above picture, you have 150 states. Each state has a gradient of 0.01. To update the gradient in the first state it would be which is almost equals to zero. So, the update in the weight will become zero. In this case, NN will not learn anything to improve and will produce the same error. This is also called as vanishing gradient problem. This is where LSTM(Long-short term Memory) came into action.
What is LSTM?
Long Short-Term Memory (LSTM) networks are the better version of Recurrent NN that extends Recurrent NN’s memory. LSTM’s memory is like computer’s memory because it can read, write and delete data.
In LSTN there are three gates: input gate, forget gate and output gate. Input gate decides whether to let the input pass or not, forget gate deletes the garbage information and output gate outputs the data at the given time step.
By providing cell gates mechanism to Recurrent NN, LTSM becomes more efficient than Recurrent NN.
Applications of Neural Network
As you are now aware of Neural Network, it’s working and types then, let’s know where it can be implemented.
- Image Recognition/Compression
- Character Recognition
- Stock Market Prediction
- Human Face Recognition
- Signature Verification Application
- Speech Recognition
- Voice Recognition
- Character Recognition
There are many more applications of Neural Network which are helpful in our day-to-day life.
Conclusion
This brings us to the end of Neural Network tutorial. In this tutorial, we learned in detail about the overview of Neural Network.
We also covered almost all the main topics of Neural Network, its programming part, types, motivation, etc. If you want to learn more, I would suggest you to try our Intellipaat “Neural Network” course that covers in-depth knowledge of most of the important topics like how to train a model, how to program for that and different approaches towards Neural Network.
We hope this tutorial helps you gain knowledge of Machine Learning Course Online. If you are looking to learn Machine Learning Training in a systematic manner with expert guidance and support then you can enroll to our Online Machine Learning Course.