Programming in MapReduce
Classes and methods are involved in the operations of MapReduce programming. We focus on the following concepts.
- Job context interface
- Job class
- Mapper class
- Reducer class
Here is a Mapreduce Tutorial Video by Intellipaat
Job context interface
It is the super-interface for all the classes, which defines different jobs in MapReduce. While running they provide the job with read-only option to the task.
The job context sub interfaces are:
- Map context: It defines the context which is given to the mapper.
Mapcontext< KEYIN, VALUEIN, KEYOUT, VALUEOUT >
- Reduce context: It defines the context which is passed to reducer.
Reducecontext< KEYIN, VALUEIN, KEYOUT, VALUEOUT >
The Main class of job context interface is a job class that helps with implementation.
Job class: The important class in the mapreduce API is job class. The Job class allows the user to job configure, submission, execution and the query state. Until the submitted job the set methods work, after that they will throw an illegal state exception.
Procedure to job submit
// Create a new Job
Job job = new Job(new Configuration());
// Specify various job-specific parameters
// Submit the job, then poll for progress until the job is complete
Constructors of job class
- job( )
- job (Configuration conf)
- job(Configuration conf, String jobname)
Methods of job class
- getjobName( ) : job name specified by the user
- getjobState( ) : Returns the job current state
- isComplete ( ) : Checks whether the job is finished or not
- setInputFormatClass( ) : Sets the input format for the job
- setjobName(String name) : Sets the job name specified by the user
- setOutputFormatClass( ) : Sets the output format for the job
- setMapperClass(Class) : Sets the mapper for the job
- setReducerClass(Class) : Sets the reducer for the job
- setPartitionerClass(Class) : Sets the partitioner for the job
- setCombinerClass(Class) : Sets the combiner for the job.
Mapper class:It defines a map job, it maps input key or value to a group of intermediate key or value pairs. Maps are individual task that translate input records to intermediate records. It maps zero or more output pairs from giving an input pair.
Method : The most important method of mapper class is map. The syntax is
map(KEYIN key, VALUEIN value, org.apache.hadoop.mapreduce.Mapper.Context context)
Reducer class : It defines the reducer job in mapreduce. Reduces is a group of intermediate values, that share a key to a smaller set of values via
we can access the configuration for a job.
Three phases of reducers are
- Shuffle: The sorted output of reducer copies from every mapper using http across the network.
- Sort: When the outputs are fetched, both the phases (shuffle and sort) occurs at a time and they merged the data.
- Reduce: Syntax of this phase is reduce (object, Iterable, Context).
The most important method of reducer class is reduce.
reduce(KEYIN key, Iterable<VALUEIN> values, org.apache.hadoop.mapreduce.Reducer.Context context)