What is Hadoop Yarn?
The Apache Hadoop YARN stands for Yet Another Resource Negotiator. It is a very efficient technology to manage the Hadoop cluster. YARN is a part of Hadoop 2 version under the aegis of the Apache Software Foundation.
YARN is a completely new way of processing data and is now rightly at the centre of The Hadoop architecture. Using this revolutionary technology it is possible to stream real-time, use interactive SQL, process data using multiple engines, manage data using batch processing on a single platform and so on.
How the YARN technology works?
YARN technology lets Hadoop provide enterprise level solutions, helping organizations achieve better resource management. It is a platform for getting consistent solutions, high level of security and governing of data over the entire spectrum of the Hadoop cluster.
The various technologies that reside within the data center can also benefit from YARN. This way it is possible to process and have a linear-scale storage in a very cost effective way. Using YARN it is possible to come with applications that can access data and run in a Hadoop ecosystem on a consistent framework.
Some of the features of YARN
High degree of compatibility: the applications that are created using the MapReduce framework can easily run YARN in a seamless manner
Better cluster utilization: YARN allocates the cluster resources in an efficient and dynamic manner and due to this the utilization is much better compared to previous version of Hadoop.
Utmost scalability: as and when the number of nodes in the Hadoop cluster expands, the YARN Resource Manager ensures that the requirements are met and processing power of the data center does not face any hurdles.
Multi-tenancy: the various engines that access data on the Hadoop cluster can seamlessly work thanks to YARN being a highly versatile technology.
Key components of YARN
YARN came into existence because there was a need to separate the two distinct tasks that go on in a Hadoop ecosystem and these are the TaskTracker and the JobTracker entities. So here are the key components of the YARN technology.
- There is a global ResourceManager
- An ApplicationMaster per application
- A NodeManager per node slave
- A Container per application that runs on a NodeManager
Thus the Node Manager and the Resource Manager became the basis on which the new distributed application works. The various resources are allocated to the system applications using the power of the Resource Manager. The Application Master works with the Node Manager and works on specific framework to get resources from the Resource Manager and also to manage the various task components.
A scheduler works with the RM framework for the right allocation of resources and ensuring the constraints of the user limit and queue capacities are adhered to at all times. As per the needs of each application the scheduler will provide the right resource.
The ApplicationMaster works in coordination with the scheduler in order to get the right resource containers, keep an eye on the status and also to track the progress of the process.
The Node Manager takes care of the application containers and launches it when the need arises, closely tracks the use of the resources like the memory, processor, network and the disk utilization and gives a detailed report to the Resoure Manager.