Introduction to MapReduce

Mapreduce tutorial covers the introduction to MapReduce, definition, why MapReduce, algorithms, examples, installation, API (Application Programming interface), implementation of MapReduce, MapReduce Partitioner, MapReduce Combiner, and administration.

Here is a MapReduce Tutorial Video from Intellipaat:

Mapreduce Tutorial – Learn Mapreduce from Experts

Why to use MapReduce?

Initially created by Google, MapReduce soon gained immense popularity because of its unmatched qualities mandating big data players to deploy it. Some of it unique features are as follows:

Features Description
Flexibility Can be developed in any language like java, c++, python, etc.
Scalability Able to process petabytes of data on single cluster
Recovery Takes care of failure by storing the replica on another machine
Lesser data motion Processing tasks appear on physical nodes which increases the speed in turn.

Apart from the above key features some of the key highlights of this technology are:

  • Map task stores data into local disk while Reduce task in HDFS.
  • Map tasks are created for each split of equal size which is equal to an HDFS block~ 64 MB
  • Tasktracker sends heartbeat signals to notify about the current state.

This blog will help you get a better understanding of Hadoop MapReduce – What it Refers To?

Last year MapReduce received the first place at “TeraByte Sort Benchmark”. They used 910 nodes, every node with two cores, i.e.,  a total of 1820 cores and were able to store the entire data in memory across the nodes. By implementation of MapReduce they were able to arrange entire one terabyte of data in 209 seconds. Users program, i.e., map and reduce functions in ANSI C.

Table of Contents

Definition Of MapReduce

What is MapReduce?: MapReduce is a patented software framework introduced by Google to support distributed computing on large data sets on clusters of computers. MapReduce is a functional programming model. It runs in the Hadoop background to provide scalability, simplicity, speed, recovery and easy solutions for data processing. Here is a Mapreduce Tutorial Video by Intellipaat [videothumb class="col-md-12" id="aonriEk5IbU" alt="Mapreduce Read More


Tasks in MapReduce Algorithm: In the MapReduce bulk tasks are divided into smaller tasks, they are then alloted to many systems. The two important tasks in MapReduce algorithm Map Reduce Map task is always performed first which is then followed by Reduce job. One data set converts into another data set in map, and individual element is broken into tuples. Read More

Examples of MapReduce

Understanding the workflow of MapReduce with an Example: On a daily basis the micro-blogging site Twitter receives nearly 500 million tweets, i.e., 3000 tweets per second. We can see the illustration on Twitter with the help of MapReduce. In the above example Twitter data is an input, and MapReduce Training performs the actions like Tokenize, filter, count and aggregate counters. Read More

Installation of MapReduce

Installing and Getting Started with MapReduce: MapReduce Tutorial supports only the Linux based OS, and it comes default with a Hadoop framework. So, we need to perform following steps to install the Hadoop framework. We have to install Java first in our system, before installing Hadoop. So using the below command we have to check whether Java is installed in our Read More

Mapreduce API (Application programming interface)

Programming in MapReduce: Classes and methods are involved in the operations of MapReduce programming.  We focus on the following concepts. Job context interface Job class Mapper class Reducer class Here is a Mapreduce Tutorial Video by Intellipaat [videothumb class="col-md-12" id="aonriEk5IbU" alt="Mapreduce Tutorial" title="MAPREDUCE Tutorial"] Job context interface It is the super-interface for all the classes, which defines different jobs in Read More

Implementation Of Mapreduce

First Program in MapReduce: The following table shows the data about customer visited the page. The table includes the monthly visitors of  page and annual average of five years. JAN FEB MAR APR MAY JUN JULY AUG SEP OCT NOV DEC AVG 2008 23 23 2 43 24 25 26 26 26 25 26 26 25 2009 26 Read More

Mapreduce Partitioner

Partitioner in MapReduce: Intermediate-outputs in the key-value pairs partitioned by a partitioner. The number of reducer tasks is equal to the number of partitions in the job.  Implementation  Let us take some employee details from the intellipaat company as an input table  with the name employee. Emp_id name age gender salary 6001 aaaaa 45 Male 50,000 6002 bbbbb 40 Female Read More

Combiner of MapReduce

What is MapReduce Combiner?: It is a localized optional reducer. It used mapper intermediate keys and applies a user method to combine the values in smaller segment of that particular mapper. Many repeated keys are produced by maps. It is often useful to do a local aggregation process done by specifying combiner. The goal of the combiner is to decrease Read More

Hadoop Administration

What is Hadoop Administration?: Hdfs  administration and MapReduce administration, both concepts come under Hadoop administration. Hdfs administration: It includes monitoring the HDFS file structure, location and  updated files. MapReduce administration: it includes monitoring the list of applications, configuration of nodes, application status. Here is a Mapreduce Tutorial Video by Intellipaat [videothumb class="col-md-12" id="1OFFAr8zYEY" alt="Mapreduce Tutorial" title="MAPREDUCE Tutorial"]  HDFS administration: We are Read More

Recommended Videos

Course Schedule

Name Date
Big Data Architect 2020-12-05 2020-12-06
(Sat-Sun) Weekend batch
View Details
Big Data Architect 2020-12-12 2020-12-13
(Sat-Sun) Weekend batch
View Details
Big Data Architect 2020-12-19 2020-12-20
(Sat-Sun) Weekend batch
View Details

1 thought on “Mapreduce Tutorial – Learn Mapreduce from Experts”

  1. My Question is related to classes which are used in Map and Reduce class like LongWriteable and IntWriteable respectively.
    Why map function use LongWriteable instead of IntWritable and why reduce function use IntWriteable instead of LongWriteable. or can i use based on my choices between these two. I understood their work and about Text parameter too. but my question is specific to map funtion first parameter and reduce function second parameter. is there any theory that i wanted to know.

Leave a Reply

Your email address will not be published. Required fields are marked *