Talking about deployment modes of spark, it simply tells us where the driver program will run. Basically, it is possible in two ways. At first, either the drives program will run on the worker node inside the cluster, i.e. Spark Cluster mode or it will run on an external client, i.e. Client spark mode.
In Client mode, ”driver” component of spark job runs on the local machine from which job is submitted. Hence, this spark mode is basically called as “client mode”.
If job submitting machine is within or near to “spark infrastructure” and there is no high network latency of data movement for final result generation between “spark infrastructure” and “driver”, in that case, this mode works very fine.
If job submitting machine is very remote to “spark infrastructure”, and also have high network latency, in that scenario, this spark mode does not work in a good manner.
In cluster mode, “driver” component of spark job will not run on the local machine from which job is submitted. Here, spark job launches “driver” component inside the cluster.
When job submitting machine is remote from “spark infrastructure”and since, “driver” component will be running within “spark infrastructure” in such case data movement between job submitting machine and “spark infrastructure” will be reduced. Therefore, this mode will work finally here.
While we work with spark cluster mode, the chances of network disconnection between “driver” and “spark infrastructure” reduces.Also, the chance of job failure is very less.