Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)

I am writing a Spark application and want to combine a set of Key-Value pairs (K, V1), (K, V2), ..., (K, Vn) into one Key-Multivalue pair (K, [V1, V2, ..., Vn]). I feel like I should be able to do this using the reduceByKey function with something of the flavor:

My_KMV = My_KV.reduce(lambda a, b: a.append([b]))

The error that I get when this occurs is:

'NoneType' object has no attribue 'append'.

My keys are integers and values V1,...,Vn are tuples. My goal is to create a single pair with the key and a list of the values (tuples).

1 Answer

0 votes
by (32.3k points)

Here is my approach to resolve your problem:


You can choose anyone, groupByKey or reduceByKey, in resolving your problem. 

Here, I prefered reduceByKey because groupByKey leads to excessive shuffling.

Browse Categories