0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

I am confused about dealing with executor memory and driver memory in Spark.

My environment settings are as below:

  • Memory 128 G, 16 CPU for 9 VM
  • Centos
  • Hadoop 2.5.0-cdh5.2.0
  • Spark 1.1.0

Input data information:

  • 3.5 GB data file from HDFS

For simple development, I executed my Python code in standalone cluster mode (8 workers, 20 cores, 45.3 G memory) with spark-submit. Now I would like to set executor memory or driver memory for performance tuning.

From the Spark documentation, the definition for executor memory is

Amount of memory to use per executor process, in the same format as JVM memory strings (e.g. 512m, 2g).

How about driver memory?

1 Answer

+1 vote
by (32.5k points)

Executors are worker nodes' processes in charge of running individual tasks in a given Spark job and The spark driver is the program that declares the transformations and actions on RDDs of data and submits such requests to the master.

Now, talking about driver memory, the amount of memory that a driver requires depends upon the job to be executed.

In Spark, the executor-memory flag controls the executor heap size (similarly for YARN and Slurm), the default value is 512MB per executor. And the driver-memory flag controls the amount of memory to allocate for a driver, which is 1GB by default and should be increased in case you call a collect() or take(N) action on a large RDD inside your application.


Spark shell required memory = (Driver Memory + 384 MB) + (Number of executors * (Executor memory + 384 MB))

Here 384 MB is maximum memory (overhead) value that may be utilized by Spark when executing jobs.

Learn Spark with this Spark Certification Course by Intellipaat.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !