Both MapPartitions() and Map() falls under the transformation types of RDD.
It returns a new RDD by applying the given function to each element of the RDD. The function in the map returns only one item.
Similar to map, but runs separately on each partition (block) of the RDD, so the function must be of type Iterator<T> ⇒ Iterator<U> when running on an RDD of type T.
Let me explain you with an example. If there are 2000 row and 20 partitions, then each partition will contain the 2000/20=100 Rows.
Now, when we apply map(func) method to rdd, the func() operation will be applied on each and every Row and in this particular case func() operation will be called 2000 times. i.e. time-consuming in some time-critical applications.
If we call mapPartition(func) method on rdd, the func() operation will be called on each partition instead of each row. In this particular case, it will be called 20 times(number of the partition). In this way, you can prevent some processing when it comes to time-consuming application.
Map() exercises function at per element level whereas MapPartitions() exercises function at the partition level.
Now talking about similarity of flatMap() as compared to Map() and MapPartitions(), flatMap() neither works on a single element as map() nor it produces multiple elements of the result as mapPartitions().
If you want to know more about Spark, then do check out this awesome video tutorial: