Which cluster type should I choose for Spark?

Question

1 Answer

Amit Rawat · Answer 1 · 2019-06-25T06:41:57+0000

Basically, we have three cluster types for Spark

Standalone
Apache Mesos
Hadoop YARN

Spark Standalone cluster (Spark deploy cluster) is Spark’s own built-in cluster environment. Since Spark Standalone is available in the default distribution of Apache Spark it is the easiest way to run your Spark applications in a clustered environment in many cases.

Standalone mode is the easiest to set up and run your Spark applications. Also, it provides almost similar features similar to other cluster managers.

Standalone works on 2 nodes:

Standalone Master - It is a resource manager for the Spark Standalone cluster.

Standalone Worker(standalone slave) - It is a worker in the Spark Standalone cluster, which actually assigns the tasks to every executor.

YARN has quite good support regarding data locality for HDFS.

Most Hadoop distributions already install YARN and HDFS together.

On YARN, a Spark executor maps to a single YARN container. In order to deploy applications to YARN clusters, you need to use Spark with YARN support.

Advantage of Yarn over Mesos and Standalone:

YARN gives you an allowance to dynamically share and centrally configure the same pool of cluster resources amongst all frameworks that run on YARN.
YARN has an authentication security service-level authorization, it is authentication for Web consoles and data confidentiality.

Mesos handles the workload in a distributed environment by dynamic resource sharing and isolation. Mesos cluster manager is the recommended choice when it comes to managing large scale apache clusters.

It is open-source software that sits between the application layer and the operating system and makes it easier to deploy and manage applications in large-scale clustered environments more efficiently.

The main idea behind Mesos is to make a large collection of heterogeneous resources. Mesos introduces a mechanism called resource offers, i.e. distributed two-level scheduling. Mesos takes responsibility and decides how many resources are required by each framework, while frameworks have the power to accept their desired resources and computations, which will be running on them.

One advantage we get using Mesos above YARN and Standalone is that Mesos has a unique thin resource sharing layer which gives frameworks a common interface for accessing cluster resources and hence, enables fine-grained sharing options across diverse cluster computing frameworks. The sole purpose is to increase resource utilization by deploying multiple distributed systems to a shared pool of nodes.

If you want more information regarding the same, refer to the following video:

Which cluster type should I choose for Spark?

Which cluster type should I choose for Spark?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions