Hadoop MapReduce - Key Features & Highlights

The highlights of Hadoop MapReduce

MapReduce is the framework that is used for processing large amounts of data on commodity hardware on a cluster ecosystem. The MapReduce is a powerful method of processing data when there are very huge amounts of node connected to the cluster. The two important tasks of the MapReduce algorithm are, as the name suggests – Map and Reduce.

The goal of the Map task is to take a large set of data and convert it into another set of data that is distinctly broken down into tuples or Key/Value pairs. Next the Reduce task takes the tuple which is the output of the Map task and makes the input for a reduction task. Here the data tuples are converted into a still smaller set of tuples. The Reduce task always follows the Map task.

The biggest strength of the MapReduce framework is scalability. Once a MapReduce program is written it can easily be extrapolated to work over a cluster which has hundreds or even thousands of nodes. In this framework, computation is sent to where the data resides.

The common terminology used in the MapReduce framework is as follows:

PayLoad: both the Map and Reduce functions are implemented by the PayLoad applications which are the two most vital functions
Mapper: the function of this application is to take the input/value pair and map it to a set of intermediate key/value pair
NameNode: this is the node that is associated with HDFS
DataNode: this is the node where the data is residing before the computation
MasterNode: this is the node that takes job requests from the client and it is where the JobTracker runs
SlaveNode: this is the node where both the Map and the Reduce tasks are run
JobTracker: the jobs are scheduled here and the tracking of the jobs are reported here
TaskTracker: it actually tracks the jobs and reports to the JobTracker with the status
Task: it is the execution of the Mapper or the Reducer on a set of data

About the Author

Abhijit

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.

Hadoop MapReduce – Key Features & Highlights

The highlights of Hadoop MapReduce

The common terminology used in the MapReduce framework is as follows:

About the Author