Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)

Is my understanding right?

  1. Application: one spark submit.

  2. job: once a lazy evaluation happens, there is a job.

  3. stage: It is related to the shuffle and the transformation type. It is hard for me to understand the boundary of the stage.

  4. task: It is unit operation. One transformation per task. One task per transformation.

Help wanted to improve this understanding.

1 Answer

0 votes
by (32.3k points)

Yes, you are going in the right direction. Just keep few things in mind.

  • The application is always considered as the main function.

  • Whenever you apply an action on an RDD, a "job" is created. Jobs are work submitted to Spark.

  • Jobs are divided into "stages" based on the shuffle boundary.

  • Moving forward, each stage is divided into tasks based on the number of partitions in the RDD. Therefore, tasks are considered as the smallest units of work for Spark.

Browse Categories