Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

RDD has a meaningful (as opposed to some random order imposed by the storage model) order if it was processed by sortBy().

Now, which operations preserve that order?

1 Answer

0 votes
by (32.3k points)

Almost all operations preserve the order, except for the operations that explicitly do not intend to preserve the order such as sortBy, partitionBy, join. Ordering is always "meaningful",

 Let’s say, if you read a file (sc.textFile) the lines of the RDD will be in the order that they were in the file.

map, filter, flatMap, and coalesce (with shuffle=false) do preserve the order like most of the RDD operations they work on Iterators inside the partitions. So, they just don’t have any choice of messing up the order.

Browse Categories

...