Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (9k points)
What is the purpose of the shuffle in Hadoop MapReduce?

1 Answer

0 votes
by (45.3k points)

In Hadoop MapReduce, the process of shuffling is used to transfer data from the mappers to the necessary reducers. It is the process in which the system sorts the unstructured data and transfers the output of the map as an input to the reducer. It is a necessary process for reducers otherwise they would not receive any input. Since this process can begin even before the map phase is completed, it helps to save time and complete the process in a lesser amount of time.

Mappers are the first phase in solving the problem. Data computation, processing as well as distribution takes place in this phase. It works on a parallel processing concept for faster execution. Reducer, on the other hand, is the next phase to completely solve the problem. It can be divided into two sub-parts – sort and shuffle. The sorting process helps to sort the data into the required order while the shuffling process collects similar work into one unit.

To know more about Hadoop and Hadoop MapReduce, you must enroll in Intellipaat’s Hadoop Training. In this course, you will get the following benefits:

  • Instructor-led training
  • 24 hours of online support
  • Industry-based projects
  • Free lifetime upgrade of the courseware
  • Course completion certificate and job assistance

If you want to become a Hadoop professional then you should watch this Hadoop tutorial:

Browse Categories