0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

RDD has a meaningful (as opposed to some random order imposed by the storage model) order if it was processed by sortBy().

Now, which operations preserve that order?

1 Answer

0 votes
by (31.4k points)

Almost all operations preserve the order, except for the operations that explicitly do not intend to preserve the order such as sortBy, partitionBy, join. Ordering is always "meaningful",

 Let’s say, if you read a file (sc.textFile) the lines of the RDD will be in the order that they were in the file.

map, filter, flatMap, and coalesce (with shuffle=false) do preserve the order like most of the RDD operations they work on Iterators inside the partitions. So, they just don’t have any choice of messing up the order.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...