I read Cluster Mode Overview and I still can't understand the different processes in the Spark Standalone cluster and the parallelism.
Is worker a JVM process or not? I ran the bin\start-slave.sh
and found that it spawned the worker, which is actually a JVM.
As per the above link, an executor is a process launched for an application on a worker node that runs tasks. Executor is also a JVM.
These are my questions:
Executors are per application. Then what is role of a worker? Does it co-ordinate with the executor and communicate the result back to the driver? or does the driver directly talks to the executor? If so, what is worker's purpose then?
How to control the number of executors for an application? 3.Can the tasks be made to run in parallel inside the executor? If so, how to configure the number of threads for an executor?
What is the relation between worker, executors and executor cores ( --total-executor-cores)?
what does it mean to have more workers per node?
Updated
Lets take examples to understand better.
Example 1: A standalone cluster with 5 worker nodes (each node having 8 cores) When i start an application with default settings.
Example 2 Same cluster config as example 1, but i run an application with the following settings --executor-cores 10 --total-executor-cores 10.
Example 3 Same cluster config as example 1, but i run an application with the following settings --executor-cores 10 --total-executor-cores 50.
Example 4 Same cluster config as example 1, but i run an application with the following settings --executor-cores 50 --total-executor-cores 50.
Example 5 Same cluster config as example 1, but i run an application with the following settings --executor-cores 50 --total-executor-cores 10.