What are the Spark transformations that causes a Shuffle?

Question

asked Jul 10, 2019 in Big Data Hadoop & Spark by Aarav (11.4k points)

I have trouble to find in the Spark documentation operations that causes a shuffle and operation that does not. In this list, which ones does cause a shuffle and which ones does not?

Map and filter does not. However, I am not sure with the others.

map(func)
filter(func)
flatMap(func)
mapPartitions(func)
mapPartitionsWithIndex(func)
sample(withReplacement, fraction, seed)
union(otherDataset)
intersection(otherDataset)
distinct([numTasks]))
groupByKey([numTasks])
reduceByKey(func, [numTasks])
aggregateByKey(zeroValue)(seqOp, combOp, [numTasks])
sortByKey([ascending], [numTasks])
join(otherDataset, [numTasks])
cogroup(otherDataset, [numTasks])
cartesian(otherDataset)
pipe(command, [envVars])
coalesce(numPartitions)

1 Answer

Amit Rawat · Answer 1 · 2019-07-10T12:13:23+0000

It is actually extremely easy to find this out, without the documentation. For any of these functions just create an RDD and call to debug string, here is one example you can do the rest on ur own.

Here, distinct creates a shuffle. And it is very important to find out this way rather than docs because many times there will be situations where a shuffle will be required or not required for a certain function. For example, usually there are situations where join requires a shuffle but if you join two RDD's that branch from the same RDD spark can sometimes omit the shuffle.

Generally, the operations given below might cause a shuffle:

cogroup
groupWith
join: hash partition
leftOuterJoin: hash partition
rightOuterJoin: hash partition
groupByKey: hash partition
reduceByKey: hash partition
combineByKey: hash partition
sortByKey: range partition
distinct
intersection: hash partition
repartition
coalesce

What are the Spark transformations that causes a Shuffle?

What are the Spark transformations that causes a Shuffle?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions