Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)

From my Spark UI. What does it mean by skipped?

enter image description here

1 Answer

0 votes
by (32.3k points)
edited by

Stage Skipped means that data has been fetched from cache and re-execution of the given stage is not required. Basically the stage has been evaluated before, and the result is available without re-execution. It is consistent with your DAG which shows that the next stage requires shuffling (reduceByKey). Whenever there is shuffling involved Spark automatically caches generated data, click here to check.

If you want to know more about Spark, then do check out this awesome video tutorial:

Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these files are preserved until the corresponding RDDs are no longer used and are garbage collected. This is done so the shuffle files don’t need to be re-created if the lineage is re-computed.

Related questions

Browse Categories