Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I already have a cluster of 3 machines (ubuntu1,ubuntu2,ubuntu3 by VM virtualbox) running Hadoop 1.0.0. I installed spark on each of these machines. ub1 is my master node and the other nodes are working as slave. My question is what exactly a spark driver is? and should we set a IP and port to spark driver by spark.driver.host and where it will be executed and located? (master or slave)

1 Answer

0 votes
by (32.3k points)
edited by

A Spark driver (an application’s driver process) is a JVM process that hosts SparkContext for a Spark application. In a Spark application, it is considered as the master node.

It is the special small place of jobs and tasks execution (using DAGScheduler and Task Scheduler). It also hosts Web UI for the environment.

It is used for splitting a Spark application into tasks and scheduling them to run on executors.

A driver builds up  coordination between workers and overall execution of tasks.

More explanation regarding the role of a driver:

  • The driver prepares the context and declares the operations on the data using RDD transformations and actions.

  • The driver submits the serialized RDD graph to the master. The master creates tasks out of it and submits them to the workers for execution. It coordinates the different job stages.

If you want to know more about Spark, then do check out this awesome video tutorial:

Browse Categories

...