0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

What is a container in YARN? Is it same as the child JVM in which the tasks on the nodemanager run or is it different?

1 Answer

0 votes
by (25.3k points)
edited ago by

Container represents a resource (memory) on a single node at a given cluster.

In yarn, we have containers similar to slots in Map Reduce. Each container will take care of the execution of a single entity like the MapReduce. In precise, a container executes a single unit of work. In MapReduce, a container can be said as a map or a reduce task.

In Hadoop 1.x a slot is allocated by the JobTracker to run each MapReduce task. Then the TaskTracker spawns a separate JVM for each task(unless JVM reuse is not enabled).

In Hadoop 2.x, Container is a place where a unit of work is executed. For instance, each MapReduce task(not the entire job) runs in one container.

An application/job will run on one or more containers.

Set of system resources are allocated for each container, currently, CPU core and RAM are supported. Each node in a Hadoop cluster can run several containers.

If you want to know more about Yarn, refer to the following video tutorial:

...