What is LSTM? Introduction to Long Short Term Memory

If you’re here to learn about Long Short-Term Memory networks (LSTM), you’re in the right place. In this blog, we will explain what LSTM is and how it works. We will also compare it to traditional RNNs and explore its applications in AI. Here’s the table of contents!

Table of Contents

What is LSTM?
LSTM Architecture
Variants of LSTM
Applications of LSTM
LSTM Implementation using Python
Conclusion

What is LSTM?

LSTM or Long Short-term Memory is a variant of Recurrent Neural Networks (RNNs), that is capable of learning long-term dependencies, especially in sequence prediction problems. It is used in the field of Deep Learning for processing of Sequential Data. LSTM has feedback connections, i.e., it is capable of processing the entire sequence of data, apart from single data points such as images. This finds application in speech recognition, machine translation, etc. LSTM is a special kind of RNN, which shows outstanding performance on a large variety of problems.

1. Why LSTM?

Earlier, we used to work with RNNs for handling sequential data. They used to maintain a memory that helped the model to work with long-term dependencies. This made them very suitable architecture for context-based tasks like time-series problems etc.

However, despite their design, traditional RNNs faced a lot of problems while dealing with long-term dependencies, due to vanishing and exploding gradients. These issues restricted the model’s ability to learn and maintain the context for a very long duration of time, leading to performance drift in long-context based task.

That is where LSTM comes into picture. It works on specialized gated mechanism, that allows the flow of information using gates and memory cells. Let’s see how LSTM works in order to resolve the same.

LSTM Architecture

The strength of LSTM networks comes from their complex architecture, which is made up a memory cell and three major gates that control information flow. Let’s have a look at them one by one.

1. Memory Cell

An LSTM network relies on memory cells (Cₜ) to preserve information over time. This helps LSTM to store long-term dependencies efficiently.

Think of it like a pipe or channel, which allows the information to flow across multiple timestamps without losing or overwriting the information, until unless it’s not modified by any of the surrounding gates.

LSTM networks have two important components for information storage:

Cell State: Runs throughout the network to maintain long-term dependencies.
Hidden State: Stores short-term dependencies and forwards information to the next timestep.

The picture presented is an LSTM’s memory cell that has gates which manages the flow of data.

2. Gates in LSTM

LSTM uses three specialized gates to regulate the flow of information in and out of the LSTM’s memory cell. These gates has the responsibility to determine how much information has to be kept, what has to be removed and what output has to be delivered at each timestamp. This ensures that the long-term dependencies are handled effectively. Let’s have a look at them

1. Forget Gate (fₜ)

The Forget Gate decides what information from the past has to be removed from the Memory Cell. Here we take the previous hidden state () and current input (), and then we process them using a sigmoid function.

If fₜ is close to 0, then memory cell forgets the previous information.
If fₜ is close to 1, then it retains past information.

This helps in removing any outdated or irrelevant data from the Cell State (Long Term Memory).

2. Input Gate (iₜ)

Input Gate controls how much information has to be added into Memory Cell. The addition of new data happens in a stepwise manner

The input gate firstly check for the previous hidden state and the current input, to determine how much important the new information is. Then, it assigns a score that decides how much information has to be considered. If the score is closer to 0, then that means not important, while a score closer to 1 means highly relevant.
Before adding information to memory, LSTM processes the input and transforms it into newer version called as “candidate update”. This is done to ensure that the information is structed in the way that fits the memory pattern.
Finally, the memory cell combines the past retained information with new candidate update.

3. Output Gate (oₜ)

The output gate determines how much of the updated information has to sent to next hidden state.

If oₜ is close to 0, very little information is passed to next hidden state.
If oₜ is close to 1, a large portion of information is used in next hidden state.

This guarantees that only valuable information is delivered, which helps to keep the learning process stable.

Master LSTM: Unleashing the Power of AI for Sequential Data

Join our cutting-edge Artificial Intelligence course to unlock advanced AI techniques!

Explore Program

Variants of LSTM

Over the period of time, several variants of LSTM has been developed to increase it’s performance and to optimize the efficiency of the model. Let’s have a look on them.

1. Bidirectional LSTM (BiLSTM)

Unlike the Standard LSTM, which processes the data in only one direction, Bidirectional LSTM can process data both in forward and backward direction. This has two LSTM layers, one of which allows data processing in forward direction and other in backward direction. This allows the network on get better understanding of between following and preceding data. This is quite beneficial when you are working on tasks like Language Modelling.

2. Gated Recurrent Units

This is a more simplified version of LSTM that combines forget gate and input gate in a single update gate. This reduces the computational complexity while maintaining the performance. This is faster as compared to it’s parent, and due to having lesser parameters, it’s also memory efficient.

Application of LSTM

LSTM has application in various domain, in numerous tasks. Here we are going to see some of most important and widely used application of the same.

1. Natural Language Processing

In Natural Language Processing, LSTMs are used in tasks like Machine Translation, Text Generation and Sentiment Analysis. Since, LSTMs are better in understanding long-term context, this makes it the most suitable choice for these tasks.

2. Speech Recognition

LSTMs excel at voice recognition because they efficiently model temporal connections in audio data, resulting in more accurate transcription and understanding of spoken language.

3. Time Series Forecasting

In time-series forecasting, LSTMs are used to estimate future values based on historical data, which is beneficial in finance, weather forecasting, and resource allocation.

LSTM Implementation using Python

Here is how, you can build a LSTM-based sentiment analysis model using IMDB Movie Review data.

1. Installing Relevant Dependencies

!pip install tensorflow numpy matplotlib

2. Importing the Libraries

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout

3. Load and Preprocess Data

The IMDB data comes pre-loaded in the Tensorflow. Hence, we will load the data, limit the vocabulary size and pad the sequences

# Load IMDB dataset
vocab_size = 10000  
max_length = 200    
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=vocab_size)

# Pad sequences to make them the same length
X_train = pad_sequences(X_train, maxlen=max_length, padding='post', truncating='post')
X_test = pad_sequences(X_test, maxlen=max_length, padding='post', truncating='post')

print(f"Training samples: {X_train.shape[0]}, Test samples: {X_test.shape[0]}")

4. Build the Model

# Define LSTM-based sentiment analysis model
model = Sequential([
    Embedding(input_dim=vocab_size, output_dim=128, input_length=max_length),
    LSTM(64, return_sequences=True),  # First LSTM layer
    Dropout(0.2),
    LSTM(64),  # Second LSTM layer
    Dropout(0.2),
    Dense(1, activation='sigmoid')  # Output layer for binary classification
])

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Print model summary
model.summary()

5. Train the Model

history = model.fit(X_train, y_train, epochs=5, batch_size=64, validation_data=(X_test, y_test), verbose=1)

6. Evaluate the Model

# Evaluate on test set
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {accuracy * 100:.2f}%")

# Plot training history
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.title("LSTM Sentiment Analysis Accuracy")
plt.show()

7. Test with a Custom Review

# Load word index mapping
word_index = imdb.get_word_index()
reverse_word_index = {value: key for key, value in word_index.items()}

# Function to decode a review
def decode_review(encoded_review):
    return " ".join([reverse_word_index.get(i - 3, "?") for i in encoded_review])

# Sample test review
sample_review = X_test[0].reshape(1, -1)  # Take the first test review
prediction = model.predict(sample_review)[0][0]
sentiment = "Positive" if prediction > 0.5 else "Negative"

print("Sample Review:")
print(decode_review(X_test[0]))
print(f"Predicted Sentiment: {sentiment} (Confidence: {prediction:.2f})")

Get 100% Hike!

Master Most in Demand Skills Now!

Conclusion

LSTM networks are indeed an improvement over RNNs as they can achieve whatever RNNs might achieve with much better finesse. As intimidating as it can be, LSTMs do provide better results and are truly a big step in Deep Learning. With more such technologies coming up, you can expect to get more accurate predictions and have a better understanding of what choices to make. To master advanced deep learning techniques, enroll in our cutting-edge Artificial Intelligence course today!