Basically, we have three cluster types for Spark
Spark Standalone cluster (Spark deploy cluster) is Spark’s own built-in cluster environment. Since Spark Standalone is available in the default distribution of Apache Spark it is the easiest way to run your Spark applications in a clustered environment in many cases.
Standalone mode is the easiest to set up and run your Spark applications. Also, it provides almost similar features similar to other cluster managers.
Standalone works on 2 nodes:
YARN has quite good support regarding data locality for HDFS.
Most Hadoop distributions already install YARN and HDFS together.
On YARN, a Spark executor maps to a single YARN container. In order to deploy applications to YARN clusters, you need to use Spark with YARN support.
Advantage of Yarn over Mesos and Standalone:
YARN gives you an allowance to dynamically share and centrally configure the same pool of cluster resources amongst all frameworks that run on YARN.
YARN has an authentication security service-level authorization, it is authentication for Web consoles and data confidentiality.
Mesos handles the workload in a distributed environment by dynamic resource sharing and isolation. Mesos cluster manager is the recommended choice when it comes to managing large scale apache clusters.
It is open-source software that sits between the application layer and the operating system and makes it easier to deploy and manage applications in large-scale clustered environments more efficiently.
The main idea behind Mesos is to make a large collection of heterogeneous resources. Mesos introduces a mechanism called resource offers, i.e. distributed two-level scheduling. Mesos takes responsibility and decides how many resources are required by each framework, while frameworks have the power to accept their desired resources and computations, which will be running on them.
One advantage we get using Mesos above YARN and Standalone is that Mesos has a unique thin resource sharing layer which gives frameworks a common interface for accessing cluster resources and hence, enables fine-grained sharing options across diverse cluster computing frameworks. The sole purpose is to increase resource utilization by deploying multiple distributed systems to a shared pool of nodes.
If you want more information regarding the same, refer to the following video: