reduceByKey vs groupByKey in Apache Spark.

Question

1 Answer

vinita · Answer 1 · 2019-08-01T07:34:52+0000

There are two different ways to compute counts:

val words1 = Array("one", "two", "two", "three", "three", "three")
val wordPairsRDD = sc.parallelize(words1).map(word => (word, 1))
val wordCountsWithReduce = wordPairsRDD .reduceByKey(_ + _) .collect()
val wordCountsWithGroup = wordPairsRDD .groupByKey() .map(t => (t._1, t._2.sum)) .collect()

reduceByKey will aggregate y key before rearranging, and on the opposite hand, groupByKey can rearrange all the value key pairs because the diagrams show.

On large size data, the difference is obvious.

reduceByKey vs groupByKey in Apache Spark.

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources