According to your configuration, do the following to dial up executor memory and driver memory for your Spark job against a standalone cluster:
A. Executor vs. Driver Memory
Executor memory : This refers to the portion of the memory assigned to each executor, or simply put, the worker process that executes the actual process on data. Executors would need adequate memory to run their part of the data, complete tasks, and cache when necessary.
Driver Memory This is the space allocated to the driver process, coordinating the execution of tasks in the cluster. Driver memory is essential because it's used for such things as running Spark control flow, aggregation of output especially when using collect() and job metadata management; hence it should have quite enough memory to accommodate lots of communications with all those executors and large-sized operations.
B. Draft Suggestions for Preliminary Modifications
Executor Memory: You're running a local cluster with 8 workers and 45.3 GB total memory. For the sake of simplicity, assume approximately 5 GB of memory per worker-that is, (45.3 GB divided by 9 minus the driver's share). You can start with setting executor memory to about 5 GB:
--conf spark.executor.memory=5g
Driver Memory: The driver should take lesser memory as compared to executors. As driver is coordinator process, 2 GB for driver memory should be around.
--conf spark.driver.memory=2g
C. Calibration and Tracking
Driver and Executor Memory Overhead: The overhead memory is required for avoiding OOM errors. One should allocate some additional overhead memory to avoid OOM error. This can be achieved by the following configuration
--conf spark.executor.memoryOverhead=512m
--conf spark.driver.memoryOverhead=512m
Monitor Spark UI: The Spark UI (http://<driver-node>:4040) will give you information about the memory usage, efficiency of the executors, and areas of bottlenecks.If much garbage collection or failure in the executor is seen, then increase some memory and overhead in the executor.
D. Sample spark-submit command
With such configuration, your command would look something like this:.
-submit \
--master spark:// <your-master-url>:7077
--conf spark.executor.memory=5g \
--conf spark.driver.memory=2g \
--conf spark.executor.memoryOverhead=512m \
--conf spark.driver.memoryOverhead=512m \
your_script.py
E. Testing and Calibration by Job Difficulty The jobs don't work. Increase the memory settings in 512 MB to determine the stable configuration. The jobs are running fine but could be faster. Increase executor memory and see if that improves anything.