Recurrent Neural Network: Types and Applications

Introduction to Recurrent Neural Networks

In this article, we are going to explore one of the most used and interesting neural networks that are used in numerous tasks, including forecasting and stock markets.

Table of Contents

What is a Recurrent Neural Network?
Types of Recurrent Neural Networks
How Do Recurrent Neural Networks Work?
RNN Architecture
Gradient Problem Solutions
Implementation of RNNs using Keras
Applications of Recurrent Neural Networks
Conclusion

What is a Recurrent Neural Network?

RNN is a special type of artificial neural network (ANN) used for time-series or sequential data. Feedforward neural networks are used when data points are independent of each other. In the case of sequential data points, they are dependent on each other. In that case, you need to modify the neural networks to incorporate dependencies between data points. RNNs have the concept of memory, which helps them store states or information of previous inputs to generate the next sequence of output.

It saves the output of a particular layer and feeds this back to the input to predict the output of the layer. As the above image shows, you can convert a normal feedforward neural network to RNN. The nodes in the different layers of the neural network are compressed to form a single layer. In the image below, A, B, and C are the parameters of the network.

Here, x represents the input layer, h denotes the hidden layer, and y is the output layer. A, B, and C are the network parameters that will be used for improving the output of the model. At any given timestep (t), the current input will be the combination of input at x(t) and x(t-1). The output is fetched back to the network to improve the output through backpropagation

output

Become a Certified AI Engineer.

Gain AI Expertise with Our Industry-Leading Certification

Explore Program

1. Why Recurrent Neural Networks?

This is an important question that needs to be answered to better understand RNNs. Every invention, upgrade, or update offers effective solutions to existing problems. RNNs were created to solve several issues of feedforward neural networks such as:

Feedforward neural networks not being able to handle sequential data.
Feedforward neural networks only consider the current input.
Feedforward neural networks not being able to memorize previous inputs.

The single best solution to these problems is RNNs. They can handle sequential data and accept current input data and previously received inputs. The memory of RNNs can memorize inputs due to their memory.

Types of Recurrent Neural Networks

There are different types of RNNs with varying architectures. They are:

1. One-to-one

It is called plain neural networks. It deals with a fixed size of the input to the fixed size of output, where both of them are independent of previous information or output. The best example to describe this type of RNN is image classification.

2. One-to-many

It deals with a fixed size of information as input, which gives a sequence of data as output. A fitting example would be image captioning, which takes in the image as input and gives a sequence of words as output.

3. Many-to-one

It takes in a sequence of information as input and gives a fixed-size output. For example, It is used in sentiment analysis where a sentence is classified as expressing positive or negative sentiment.

4. Many-to-many

This type of RNN takes in a sequence of information as input and processes output recurrently as a data sequence. It is applied in machine translation wherein RNNs read sentences in a language and give output in other languages.

How Do Recurrent Neural Networks Work?

In RNNs, the information cycles through the loop to the middle hidden layer.

The input layer, x, takes in the input to the neural network processes and passes it into the middle layer. The middle layer, h, can consist of multiple hidden layers, each with its activation functions, weights, and biases. If you have a neural network where the various parameters of different hidden layers are not affected by the previous layer, i.e., neural networks remain unaffected since they do not have memory, then you can use RNNs.

The RNNs will standardize the different activation functions, weights, and biases so that each hidden layer has the same parameters. So, instead of creating multiple hidden layers, it will just create one loop over it as many times as required.

Purse a Career in AI Engineering.

Unlock Your AI Potential with Our Certification Program

Explore Program

RNN Architecture

1. Bidirectional Recurrent Neural Networks (BRNNs)

While unidirectional RNNs can only draw from previous inputs to make predictions about the current state, BRNNs can pull in future data to improve their accuracy. For example, if you take a phrase in which the last word is known, then predicting the phrase will become much easier after the first word is also known.

2. Long short-term memory (LSTM)

It is a popular artificial recurrent neural network used in the field of deep learning. LSTM has feedback connections, which are not present in the feedforward neural networks. LSTM can process not just single data points, but also entire data sequences. LSTM applies to tasks such as connected handwriting recognition, speech recognition, network traffic anomaly detection, etc. A common LSTM unit is composed of a cell, input gate, output gate, and forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information in and out of the cell. LSTM is well-suited to processing, classifying, and making predictions based on time-series data since there can be lags of unknown duration between important events in a time series. LSTM architecture is extensively used in solving vanishing gradient problems while training traditional RNNs. LSTM is the best possible solution today for solving sequence-to-sequence and time-series-related problems. The only disadvantage of LSTM is the time taken to train a model. A lot of system resources and time goes into training a simple model. It is a hardware constraint, which can be easily solved once the hardware becomes more efficient.

3. Gated Recurrent Units (GRUs)

This architecture is also similar to LSTM. This is because GRUs also work to address the short-term memory problem of RNN models. GRUs use hidden states, instead of cell states, and two gates, in place of three gates. The two gates that are used here are the reset gate and the update gate. Very similar to LSTM, the reset and update gacontrolsrol the amount of information to retain and which information to retain.

Gradient Problem Solutions

LSTMs are a very efficient way to deal with gradient problems. Let us first discuss the long-term dependencies. Suppose you want to predict the last word in the text, “The clouds are in the _____”. The most obvious answer to this will be “sky”. You do not require any further context to predict that last word in the mentioned example.

Now consider this example, “I have been staying in Germany for the last 10 years. I can speak fluently _____”. To predict this last word, you need the context of Germany. Then the most suitable answer to this will be “German”. This gap between the relevant information and the point where it is needed may have become very large. LSTMs help you solve this problem.

3. Backpropagation Through Time

Backpropagation through time is when you apply a backpropagation algorithm to an RNN that has time-series data as its input. In RNNs, one input is fed into the network at a time, and a single output is obtained. In backpropagation, you will use current as well as previous inputs as input. It is called a timestamp, and a timestamp consists of many time-series data points entering RNNs simultaneously. Once the neural network has trained on a time set and given you output, the output will then be used to calculate and accumulate the errors. Finally, the network is rolled back up and weights are recalculated and updated while keeping the errors in mind.

Implementation of RNNs using Keras

Here is how you can implement RNN using Keras on the IMDB dataset:

1. Import the libraries

import numpy as np 

from tensorflow import keras 

from tensorflow.keras import layers 

from tensorflow.keras.datasets import imdb 

from tensorflow.keras.preprocessing import sequence

2. Load the dataset

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

3. Defining the parameters

max_features = 10000 # Number of words to consider as features
maxlen = 500 # Maximum sequence length
batch_size = 32
embedding_dim = 32 # Dimension of word embeddings

4. Padding the data, to ensure the consistency

x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

5. Creating a simple RNN model

def create_simple_rnn_model():
model = keras.Sequential()
model.add(layers.Embedding(max_features, embedding_dim, input_length=maxlen)) # Embedding layer
model.add(layers.SimpleRNN(32))
model.add(layers.Dense(1, activation="sigmoid")) # Output layer (binary classification)
return model

model = create_simple_rnn_model()

6. Compiling the model

model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

7. Check for the model’s architecture using a summary

model.summary()

8. Training the model

history = model.fit(x_train, y_train, epochs=5, batch_size=batch_size, validation_split=0.2)

9. Evaluating the model

loss, accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")

10. Visualizing the results

import matplotlib.pyplot as plt

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

Get 100% Hike!

Master Most in Demand Skills Now!

Applications of Recurrent Neural Networks

RNNs have a wide range of applications such as:

It helps in solving time-series problems such as stock market predictions.
It helps in solving text mining and sentiment analysis problems.
RNNs are heavily used in developing NLP technology, machine translation, speech recognition, language modeling, etc.
It helps in image captioning, video tagging, text summarization, image recognition, facial recognition, and other OCR applications.

Conclusion

The traditional feedforward algorithms cannot solve time-series and data sequence problems, while RNNs can solve such problems efficiently. This tutorial has helped you learn RNN in detail, understand its types, the need for RNN, its architecture, and how it’s used to solve gradient problems, and finally get to know about its applications. If you want to learn more about AI, here is the perfect Artificial Intelligence Course and Generative AI for Leaders course that will help you out.