Multi Layer Perceptron

Multi-layer perceptrons is an Artificial Neural Network that is used in various tasks that includes classification and regression. In this article we are going to explore how this neural network works

Table of content

A Multi-layer perceptron has got its name as “Multi-layer”, as it has an input layer, and multiple hidden layers and an output layer. These hidden layers that actually help the model to understand the complex distribution of data. There are three major components of a Multi-layer perceptron:

    1. Input Layer: This is the very first layer of the MLP, from where we feed the data into the network. This data is in the form of vector
    2. Hidden Layer: This layer is responsible for processing the information, for understanding the data. There can be multiple hidden layers in a network.
    3. Output Layer: This layer handles the output for the entire procedure. The number of neurons is decided on the basis of the number of outputs.
Your Path to AI Engineering Mastery
Accelerate Your AI Success with Our Proven Certification
quiz-icon

Math behind Neural networks

The input layer is usually a vector using which the neural network learns the pattern by learning the weights.
In the first phase of learning, we do a Forward pass. We need to activate layer 1 using activation function(sigmoid or logistic function), we carry out the same process for and so on.
The weights are initialized randomly in the beginning, they get updated based on the learning happening further.
The weights are based on the number of connections between the input layer and hidden layer. We then get the output of using a simple equation in which the weights are multiplied by inputs and then added to bias (wx+b).
Then we repeat this process for the output layer neurons, using the output from the hidden layer neurons as inputs.
These are the actual outcome values. The cost function or total error is the sum of squared error function (target – output).

Backwards Pass

The goal of back propagation is to update the weights, so that the actual output is closer to target outputs to minimize errors. Neural networks are supervised learning, the derivations are applied as chain rule. The error in the previous state compared to the new state should be lesser, it is based on the learning rate of the neural network. Learning rate is represented as alpha (𝛼) or epsilon (ϵ). Weights are updated in each layer using partial derivatives. Next the backward pass is continued by calculating new values for  , , and . Likewise the ‘n+1’ layer learns from the previous layer ‘n’.
In forward pass, the weights are assigned randomly, whereas in backward pass the weights are assigned based on the learning rate. Gradient descent is descenting or gradually decreasing the error factor to get the optimal or very less error. Learning rate, momentum all of these contribute to gradient descent.

Issues with the Multi-layer perceptron

Overfitting

When  simple problems use multiple networks, an Overfitting issue arises. The data is trained so closely that when the performance on a specific problem is close to 90%, but poor on other real examples, the training set is very specific rather than being generic. Use lesser hidden layers to avoid overfitting for simpler problems.

Non-Linear Relationship

When the data is non linear separable, or when we are not able to separate the data by a straight line, then the issue of Non Linear relationship comes into the picture. Here the MLP models are not able to understand the underlying relationship between 

Unstable Training

Whenever we are trying to train a MLP model, they have a really unstable training. Since, their gradients keep on fluctuating in training. This eventually hampers the performance of the model, as the metrics keep on changing after every iteration.

How to overcome the issue of MLP?

Dropouts

To avoid overfitting we use Dropouts. The motivation here is that it uses distinct genes to produce offspring rather than strengthening co-adapting them. Moreover, this also helps in stabilising the model

Activation Functions

Activation Functions like tanh or sigmoid are really helpful when it comes to understanding the non-linearly separable data distribution. They generally produce results in the range of 0 to 1 for sigmoid or -1 to 1 for tanh. This helps the model to generalize the data well enough.

Hyperparameters

Architecture for creating a neural network consists of – No. layers, perceptrons, activation functions, dropout, optimizer, loss function, no of epoch, learning rate, momentum, metrics(accuracy) and batch size.

Overfitting

When  simple problems uses multiple networks, Overfitting issue arises. The data is trained too closely that when the performance to specific problem is close to 90%, but poor on other real examples, the training set is very specific rather being generic. Use lesser hidden layers to avoid overfitting for simpler problems.

Dropouts

To avoid overfitting we use Dropouts. The motivation here is that it uses distinct genes to produce offsprings rather than strengthening co-adapting them.

Your Guide to Professional AI Engineering
Transform Your Future with Our AI Certification
quiz-icon

Implementation using TensorFlow & Keras

Here is how you can implement a MLP model using MNIST dataset using Tensorflow & Keras:

Firstly you import the necessary libraries

import numpy as np

import tensorflow as tf

from tensorflow.keras import layers, models

from tensorflow.keras.datasets import mnist

from tensorflow.keras.utils import to_categorical
  1.  Load the necessary dataset, here we are using MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
  1. Now, we will be processing the data
x_train = x_train.astype(“float32”) / 255.0

x_test = x_test.astype(“float32”) / 255.0
  1. Flatten the image
x_train = x_train.reshape((x_train.shape[0], 28 * 28))

x_test = x_test.reshape((x_test.shape[0], 28 * 28))
  1. Encoding the labels
y_train = to_categorical(y_train, 10)

y_test = to_categorical(y_test, 10)
  1. Engineering our MLP model
model = models.Sequential([

    layers.Dense(128, activation='relu', input_shape=(28 * 28,)),

    layers.Dropout(0.2),

    layers.Dense(64, activation='relu'),

    layers.Dropout(0.2),

    layers.Dense(10, activation='softmax')

])
  1. Compiling the model
model.compile(optimizer='adam',

              loss='categorical_crossentropy',

              metrics=['accuracy'])
  1. Training the model
model.fit(x_train, y_train,

          epochs=10,

           batch_size=128,

          validation_data=(x_test, y_test))
  1. Evaluating the model
test_loss, test_accuracy = model.evaluate(x_test, y_test)

print(f"Test Loss: {test_loss:.2f}")

print(f"Test Accuracy: {test_accuracy:.2f}")

Get 100% Hike!

Master Most in Demand Skills Now!

Conclusion

These neural networks can very efficiently work and handle complex data and relationships and that’s one of the prime reasons why they are so intensively used in classification or regression tasks on complex datasets. If you would like to know more about these AI techniques, check out our most interesting Artificial Intelligence Program

 

Our Artificial Intelligence Courses Duration and Fees

Program Name
Start Date
Fees
Cohort starts on 1st Feb 2025
₹79,002
Cohort starts on 18th Jan 2025
₹79,002
Cohort starts on 8th Feb 2025
₹79,002

About the Author

Principal Data Scientist

Meet Akash, a Principal Data Scientist with expertise in advanced analytics, machine learning, and AI-driven solutions. With a master’s degree from IIT Kanpur, Aakash combines technical knowledge with industry insights to deliver impactful, scalable models for complex business challenges.