0 votes
1 view
in Big Data Hadoop & Spark by (42.2k points)

Please explain to me the difference between map() and flatMap() in Spark.

Thanks

1 Answer

0 votes
by (19k points)
Spark map function expresses a one-to-one transformation. It modifies each element of a collection into one element of the resulting collection. While Spark flatMap function expresses a one-to-many transformation. It modifies each element to 0 or more elements. Both map() and flatMap() are used for transformations.

The map() transformation takes in a function and applies it to each element in the RDD and the result of the function is a new value of each element in the resulting RDD. The flatMap() is used to generate multiple output elements for each input element. When using map(), the function we present to flatMap() is called individually for each element in our input RDD. Instead of returning a single element, an iterator with the return values is returned.
...