Hadoop YARN - Arcitecture, Components and Working

Before beginning the tutorial, let’s have a look at the agenda for this tutorial:

What is Hadoop YARN?

Apache Hadoop YARN (Yet Another Resource Negotiator) is a resource management layer introduced in Hadoop 2.x and enhanced in Hadoop 3.x. It allows various data processing engines, such as interactive processing, graph processing, batch processing, and stream processing, to run and process data stored in HDFS (Hadoop Distributed File System). With Hadoop 3.x, YARN introduced several improvements like opportunistic containers for better resource utilisation, timeline service v2 for enhanced scalability, and support for container orchestration in cloud environments.

YARN was introduced to make the most out of HDFS, and YARN also handles job scheduling.

Now that YARN has been introduced, the architecture of Hadoop 3.x and beyond provides a data processing platform that is not limited to MapReduce. It lets Hadoop process other-purpose-built data processing systems as well, i.e., other frameworks can run on the same hardware on which Hadoop is installed.

Now that you have learnt what YARN is, let’s see why we need Hadoop YARN.

Before that, let’s watch this video tutorial on Hadoop:

Why is YARN Hadoop Used?

Despite being thoroughly proficient at data processing and computations, Hadoop 1.x had some shortcomings like delays in batch processing, scalability issues, and limited support for varied workloads, as it relied solely on MapReduce for processing big datasets. With the introduction of YARN, Hadoop evolved to support a variety of processing approaches—enabling stream processing, interactive querying, and batch jobs to run side by side. The YARN framework can also run non-MapReduce applications, thus overcoming the limitations of Hadoop 1.x.

In recent years, YARN has become even more critical in modern data platforms. As of 2025, it plays a key role in cloud-native environments by integrating with container orchestration tools like Kubernetes. It enables scalable and flexible deployment of big data and machine learning workloads on platforms such as AWS EMR, Google Dataproc, and Azure HDInsight.

Next, let’s discuss the Hadoop YARN architecture.

Hadoop YARN Architecture

Now, we will discuss the architecture of YARN. Apache YARN framework contains a Resource Manager (master daemon), Node Manager (slave daemon), and an Application Master. YARN’s architecture in Hadoop 3.x includes advanced features like opportunistic containers, which allow tasks to run in spare capacity, and integration with cloud-native environments, enhancing scalability and flexibility.

Let’s now discuss each component of Apache Hadoop YARN one by one in detail.

Get 100% Hike!

Master Most in Demand Skills Now!

Resource Manager

Resource Manager is the master daemon of YARN. It is responsible for managing several other applications, along with the global assignments of resources such as CPU and memory. It is used for job scheduling. Resource Manager has two components:

Scheduler: Schedulers’ task is to distribute resources to the running applications. It only deals with the scheduling of tasks and hence it performs no tracking and no monitoring of applications.
Application Manager: The application Manager manages applications running in the cluster. Tasks, such as the starting of Application Master or monitoring, are done by the Application Manager.

Let’s move on with the second component of Apache Hadoop YARN.

Node Manager

Node Manager is the slave daemon of YARN. It has the following responsibilities:

Node Manager has to monitor the container’s resource usage, along with reporting it to the Resource Manager.
The health of the node on which YARN is running is tracked by the Node Manager.
It takes care of each node in the cluster while managing the workflow, along with user jobs on a particular node.
It keeps the data in the Resource Manager updated
Node Manager can also destroy or kill the container if it gets an order from the Resource Manager to do so.

The third component of Apache Hadoop YARN is the Application Master.

Application Master

Every job submitted to the framework is an application, and every application has a specific Application Master associated with it. Application Master performs the following tasks:

It coordinates the execution of the application in the cluster, along with managing the faults.
It negotiates resources from the Resource Manager.
It works with the Node Manager for executing and monitoring other components’ tasks.
At regular intervals, heartbeats are sent to the Resource Manager for checking its health, along with updating records according to its resource demands.

Now, we will step forward with the fourth component of Apache Hadoop YARN.

Container

A container is a set of physical resources (CPU cores, RAM, disks, etc.) on a single node. The tasks of a container are listed below:

It grants the right to an application to use a specific amount of resources (memory, CPU, etc.) on a specific host.
YARN containers are particularly managed by a Container Launch context which is Container Life Cycle (CLC). This record contains a map of environment variables, dependencies stored in remotely accessible storage, security tokens, the payload for Node Manager services, and the command necessary to create the process.

How does Apache Hadoop YARN work?

YARN separates HDFS and MapReduce, making the Hadoop environment more suitable for applications that can’t wait for the batch processing jobs to get finished. So, no more batch processing delays with YARN! This architecture lets you process data with multiple processing engines using real-time streaming, interactive SQL, batch processing, handling of data stored in a single platform, and working with analytics in a completely different manner. It can be considered as the basis of the next generation of the Hadoop ecosystem, ensuring that the forward-thinking organizations are realizing the modern data architecture.

How is an application submitted in Hadoop YARN?

1. Submit the job
2. Get an application ID
3. Retrieval of the context of application submission

Start Container Launch
Launch Application Master

4. Allocate Resources.

Container
Launching

5. Executing

Workflow of an Application in Apache Hadoop YARN

Submission of the application by Client
Container allocation for starting Application Manager
Registering the Application Manager with Resource Manager
Application Manager asks for containers from Resource Manager
Application Manager notifies Node Manager to launch containers
Application code gets executed in the container
Client contacts Resource Manager/Application Manager to monitor the status of the application
Application Manager gets disconnected with Resource Manager

Features of Hadoop YARN

High-degree compatibility: Applications created use the MapReduce framework that can be run easily on YARN.
Better cluster utilization: YARN allocates all cluster resources efficiently and dynamically, which leads to better utilization of Hadoop as compared to the previous version of it.
Utmost scalability: Whenever there is an increase in the number of nodes in the Hadoop cluster, the YARN Resource Manager assures that it meets the user requirements.
Multi-tenancy: Various engines that access data on the Hadoop cluster can efficiently work together all because of YARN as it is a highly versatile technology.

YARN vs MapReduce

In Hadoop 1.x, the batch processing framework MapReduce was closely paired with HDFS. With the addition of YARN to these two components, giving birth to Hadoop 2.x, came a lot of differences in how Hadoop worked. Let’s go through these differences.

Criteria	YARN	MapReduce
Type of Processing	Real-time, batch, and interactive processing using multiple engines	Batch processing using a single engine
Cluster Resource Optimization	High — central resource management improves efficiency	Limited — relies on fixed Map and Reduce task slots
Suitable For	Both MapReduce and other processing frameworks (e.g., Spark, Tez)	Only MapReduce applications
Cluster Resource Manager	YARN ResourceManager and NodeManager	JobTracker and TaskTracker
Namespace Support	Supports multiple namespaces (via Federation in HDFS)	Supports only a single HDFS namespace

In this section of the Hadoop tutorial, we learned about YARN in-depth. In the next section of this tutorial, we shall be talking about Streaming in Hadoop.