Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

Can someone explain to me the difference between map and flatMap and what is a good use case for each?

1 Answer

0 votes
by (32.3k points)
edited by

Map and FlatMap are the transformation operations in Apache Spark.

In map transformation, a new RDD is produced by applying given function on each element of the existing RDD. The function in the map returns only one item.

A flatMap transformation is similar to the map, it also gives a new RDD by applying a given function on each element of the RDD, but here the output is flattened.

Map function takes one element as an input process, based on the custom code and returns one element at a time. Map transforms an RDD of length N into another RDD of with the same length. The input and output RDDs typically may have the same number of records.

A FlatMap function also takes one element as an input process according to the custom code (specified by the developer) and returns a 0 or more element at a time. flatMap() transforms an RDD of length N into another RDD of length M.

There is a file greetings.txt in HDFS with the following lines: Good Morning Good Evening Good Day Happy Birthday Happy New Year

lines = sc.textFile("greetings.txt")

lines.map(lambda line: line.split()).collect()

Output:-

[['Good', 'Morning'], ['Good', 'Evening'], ['Good', 'Day'], ['Happy', 'Birthday'], ['Happy', 'New', 'Year']]

 lines.flatMap(lambda line: line.split()).collect()

**Output:-**

['Good', 'Morning', 'Good', 'Evening', 'Good', 'Day', 'Happy', 'Birthday', 'Happy', 'New', 'Year']

**We can do wordcount of file using flatMap:-**

lines = sc.textFile("greetings.txt")

sorted(lines.flatMap(lambda line: line.split()).map(lambda w: (w,1)).reduceByKey(lambda v1, v2: v1+v2).collect())

Output:- [('Birthday', 1), ('Day', 1), ('Evening', 1), ('Good', 3), ('Happy', 2), ('Morning', 1), ('New', 1), ('Year', 1)]

A flatMap() flattens multiple lists into one single list. It is basically the combination of a map and a flat operation i.e, it first applies map function and then flattens the result. After Flattening, it gets transformed into a “one level” structure.

If you want to know more about Spark, then do check out this awesome video tutorial:

Related questions

Browse Categories

...