What is the relation between 'mapreduce.map.memory.mb' and 'mapred.map.child.java.opts' in Apache Hadoop YARN?

Question

1 Answer

Amit Rawat · Answer 1 · 2019-06-26T05:59:44+0000

mapreduce.map.memory.mb is the physical memory for your map process produced by YARN container. While mapred.map.child.java.opts is the JVM heap size for your map and process.

Most common errors that we get nowadays occurs when we run any MapReduce job:

Application application_1409135750325_48141 failed 2 times due to AM Container for
appattempt_1409135750325_48141_000002 exited with exitCode: 143 due to: Container
[pid=4733,containerID=container_1409135750325_48141_02_000001] is running beyond physical memory limits.
Current usage: 2.0 GB of 2 GB physical memory used; 6.0 GB of 4.2 GB virtual memory used. Killing container.

YARN monitors memory of your running containers. In MapReduce container is either map or reduce process.

Whenever the allocated memory of any mapper process exceeds the default memory limit. Hadoop kills the mapper while giving the error:

Container[pid=container_1406552545451_0009_01_000002,containerID=container_234132_0001_01_000001] is running beyond physical memory limits. Current usage: 569.1 MB of 512 MB physical memory used; 970.1 MB of 1.0 GB virtual memory used. Killing container.

So to overcome these problems increment in the memory available to your MapReduce job is done. Here, we have two memory settings that needs to be configured at the same time:

The physical memory for your YARN map and reduce processes(mapreduce.map.memory.mb and mapreduce.reduce.memory.mb)
The JVM heap size for your map and reduce processes (mapreduce.map.java.opts and mapreduce.reduce.java.opts)

Here, we set the YARN container physical memory limits for your map and reduce processes by configuring mapreduce.map.memory.mb and mapreduce.reduce.memory.mb, respectively. For example, if you want to limit your map process and reduce process to 2GB and 4GB, respectively and you want to make this the default limit in your cluster, then you have to set the mapred-site.xml in the following way:

<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>

The physical memory configured for your job must fall within the minimum and maximum memory allowed for containers in your cluster.

Now, just after configuring your physical memory of map and reduce processes, you need to configure the JVM heap size for your map and reduce processes. The sizes of these processes needs to be less than the physical memory you configured in the previous section. As a general rule, they should be 80% the size of the YARN physical memory settings.

To set the map and reduce heap size you need to configure mapreduce.map.java.opts and mapreduce.reduce.java.opts respectively. Now while continuing with the previous section example, we’ll arrive at our Java heap sizes by taking the 2GB and 4GB physical memory limits and multiple by 0.8 to. The changes will be in mapred-site.xml as shown below(assuming you wanted these to be the defaults for your cluster):

<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1638m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx3278m</value>
</property>

If you want more information regarding the same, refer to the following link:

What is the relation between 'mapreduce.map.memory.mb' and 'mapred.map.child.java.opts' in Apache Hadoop YARN?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources