Mapreduce API (Application programming interface)

By Abhijit | Last updated on October 15, 2024 | 86957 Views

Programming in MapReduce

Classes and methods are involved in the operations of MapReduce programming. We focus on the following concepts.

Job context interface
Job class
Mapper class
Reducer class

Here is a Mapreduce Tutorial Video by Intellipaat

Job context interface

Job class:

It is the super-interface for all the classes, which defines different jobs in MapReduce. While running they provide the job with read-only option to the task.

The job context sub interfaces are:

Map context: It defines the context which is given to the mapper.

Mapcontext< KEYIN, VALUEIN, KEYOUT, VALUEOUT >

Reduce context: It defines the context which is passed to reducer.

Reducecontext< KEYIN, VALUEIN, KEYOUT, VALUEOUT >

The Main class of job context interface is a job class that helps with implementation.

Learn and become a MapReduce Professional by enrolling in MapReduce Training.

Job class:

The important class in the mapreduce API is job class. The Job class allows the user to job configure, submission, execution and the query state. Until the submitted job the set methods work, after that they will throw an illegal state exception.

Procedure to job submit

<br>
// Create a new Job<br>
Job job = new Job(new Configuration());<br>
job.setJarByClass(MyJob.class);<br>
// Specify various job-specific parameters<br>
job.setJobName("Intellipaat");<br>
job.setInputPath(new Path("in"));<br>
job.setOutputPath(newPath("out"));<br>
 job.setMapperClass(MyJob.MyMapper.class);<br>
job.setReducerClass(MyJob.MyReducer.class);<br>
 // Submit the job, then poll for progress until the job is complete<br>
job.waitForCompletion(true);<br>

Constructors of job class

job( )
job (Configuration conf)
job(Configuration conf, String jobname)

Methods of job class

getjobName( ) : job name specified by the user
getjobState( ) : Returns the job current state
isComplete ( ) : Checks whether the job is finished or not
setInputFormatClass( ) : Sets the input format for the job
setjobName(String name) : Sets the job name specified by the user
setOutputFormatClass( ) : Sets the output format for the job
setMapperClass(Class) : Sets the mapper for the job
setReducerClass(Class) : Sets the reducer for the job
setPartitionerClass(Class) : Sets the partitioner for the job
setCombinerClass(Class) : Sets the combiner for the job.

Mapper class:

It defines a map job, it maps input key or value to a group of intermediate key or value pairs. Maps are individual task that translate input records to intermediate records. It maps zero or more output pairs from giving an input pair.

Method : The most important method of mapper class is map. The syntax is

<br>
map(KEYIN key, VALUEIN value, org.apache.hadoop.mapreduce.Mapper.Context context)<br>

Reducer class :

It defines the reducer job in mapreduce. Reduces is a group of intermediate values, that share a key to a smaller set of values via

JobContext.getConfiguration() method

we can access the configuration for a job.
Three phases of reducers are

Shuffle: The sorted output of reducer copies from every mapper using http across the network.
Sort: When the outputs are fetched, both the phases (shuffle and sort) occurs at a time and they merged the data.
Reduce: Syntax of this phase is reduce (object, Iterable, Context).

Method
The most important method of reducer class is reduce.

<br>
reduce(KEYIN key, Iterable<VALUEIN> values, org.apache.hadoop.mapreduce.Reducer.Context context)<br>

About the Author

Abhijit

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.