Combiner of MapReduce

By Abhijit | Last updated on May 26, 2025 | 86127 Views

What is MapReduce Combiner?

It is a localized optional reducer. It used mapper intermediate keys and applies a user method to combine the values in smaller segment of that particular mapper.

Many repeated keys are produced by maps. It is often useful to do a local aggregation process done by specifying combiner. The goal of the combiner is to decrease the size of the data. It has the same interface as reducer and often are the same class.

Method: conf.setCombinerClass(Reduce.class);
Work flow of combiner:

It has not predefined interface and it implements reduce( ) method
Each map output key-value operated by combiner and the output key-value is same as reducer class
A combiner produces a summary of large data set.

Implementation: use below input.txt input text file.
What do you mean by Object
What do you know about Java
What is Java Virtual Machine
How Java enabled High Performance

Input: line by line text

Output : forms the key-value pairs

<1,   What do you mean by Object
<2 , What do you know about Java
<3,   What is Java Virtual Machine
<4,   How Java enabled High Performance

Phases in combiner

There are three important phases in the combiner

Map phase
Combiner phase
Reducer phase

Map phase:

Record reader gives the input to this phase and produces the output as another set of key-value pairs.

Record reader is the first phase of MapReduce, it reads every line from the input text file as text.

Input:
<1,   What do you mean by Object>
<2 , What do you know about Java>
<3,   What is Java Virtual Machine>
<4,   How Java enabled High Performance>
Mapper class and map function

We will get the output like

Combiner phase:

This phase takes the map phase output as input and the output of combiner phase is key-value collections pair.so,
Input:

Use following code to the class declaration of map phase, combiner phase and reduce phase.

<br>
job.setMapperClass(TokenizerMapper.class);<br>
job.setCombinerClass(IntSumReducer.class);<br>
job.setReducerClass(IntSumReducer.class);<br>

Output: The expected output is

Reduce phase:

This phase takes combiner phase output as input .
Use the following code for reduce phase.

<br>
public static class IntSumReducer extends<br>
Reducer<Text,IntWritable,Text,IntWritable><br>
{<br>
private IntWritable result = new IntWritable();<br>
public void reduce(Text key, Iterable values,Context context) throws IOException,<br>
InterruptedException<br>
{<br>
int sum = 0;<br>
for (IntWritable val : values)<br>
{<br>
sum += val.get();<br>
}<br>
result.set(sum);<br>
context.write(key, result);<br>
}<br>
}<br>

Output:

Record writer: Output
What 3
Do                                           2
You                                         2
Mean 1
By 1
Object 1
Know 1
About 1
Java                                         3
Is 1
Virtual                                     1
Machine                                  1
How 1
Enabled 1
High 1
Performance 1

Related Blogs	What’s Inside
What is Data Governance?	Describes data governance for ensuring data integrity and regulatory compliance.
Big Data Analytics Tools Performance Testing	Outlines techniques for testing the performance of big data analytics tools.
Scala Tutorial	Guides on Scala programming for big data processing and functional paradigms.
Kafka Versions	Details Apache Kafka versions and their enhancements for streaming applications.
Scala Array	Details Scala arrays for managing data in big data and programming tasks.
MapReduce Partitioner	Explains the MapReduce partitioner for efficient data distribution in Hadoop.
MapReduce in Hadoop	Outlines MapReduce for processing large-scale data in Hadoop ecosystems.
Kafka Monitoring	Provides insights on monitoring Kafka for optimal data streaming performance.

About the Author

Abhijit

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.