Programming in MapReduce

Classes and methods are involved in the operations of MapReduce programming.  We focus on the following concepts.

  • Job context interface
  • Job class
  • Mapper class
  • Reducer class

Here is a Mapreduce Tutorial Video by Intellipaat

Job context interface

Job class:

It is the super-interface for all the classes, which defines different jobs in MapReduce. While running they provide the job with read-only option to the task.

The job context sub interfaces are:

  • Map context: It defines the context which is given to the mapper.

Mapcontext< KEYIN, VALUEIN, KEYOUT, VALUEOUT >

  • Reduce context: It defines the context which is passed to reducer.

Reducecontext< KEYIN, VALUEIN, KEYOUT, VALUEOUT >

The Main class of job context interface is a job class that helps with implementation.

Learn and become a MapReduce Professional by enrolling in MapReduce Training.

Job class:

The important class in the mapreduce API is job class.  The Job class allows the user to job configure, submission, execution and the query state. Until the submitted job the set methods work, after that they will throw an illegal state exception.

Procedure to job submit

// Create a new Job
Job job = new Job(new Configuration());
job.setJarByClass(MyJob.class); 
// Specify various job-specific parameters
job.setJobName("Intellipaat");
job.setInputPath(new Path("in"));
job.setOutputPath(newPath("out"));
 job.setMapperClass(MyJob.MyMapper.class);
job.setReducerClass(MyJob.MyReducer.class);
 // Submit the job, then poll for progress until the job is complete
job.waitForCompletion(true);

Certification in Bigdata Analytics

Constructors of  job class

  • job( )
  • job (Configuration conf)
  • job(Configuration conf, String jobname)

 Methods of  job class

  • getjobName( ) : job name specified by the user
  • getjobState( ) : Returns the  job current state
  • isComplete ( ) : Checks whether the job is finished or not
  • setInputFormatClass( )  :  Sets the input format for the job
  • setjobName(String name) : Sets the job name specified by the user
  • setOutputFormatClass( ) : Sets the output format for the job
  • setMapperClass(Class) : Sets the mapper for the job
  • setReducerClass(Class) : Sets the reducer for the job
  • setPartitionerClass(Class) : Sets the partitioner for the job
  • setCombinerClass(Class) : Sets the combiner for the job.

Mapper class:

It defines a map job, it maps input key or value to a group of intermediate key or value pairs. Maps are individual task that translate input records to intermediate records.  It maps zero or more output pairs from giving an input pair.

 Method : The  most important method of  mapper class is map. The syntax is

map(KEYIN key, VALUEIN value, org.apache.hadoop.mapreduce.Mapper.Context context)

Reducer class :

It defines the reducer job in mapreduce. Reduces is a group of intermediate values, that share a key to a smaller set of values via

JobContext.getConfiguration() method

we can access the configuration for a job.
Three phases of reducers are

  • Shuffle: The sorted output of reducer copies from every mapper using http across the network.
  • Sort: When the outputs are fetched, both the phases (shuffle and sort) occurs at a time and they merged the data.
  •  Reduce: Syntax of this phase is reduce (object, Iterable, Context).

Become a Big Data Architect

Method
The most  important method of reducer class is reduce.

reduce(KEYIN key, Iterable<VALUEIN> values, org.apache.hadoop.mapreduce.Reducer.Context context)

Course Schedule

Name Date Details
Big Data Course 30 Mar 2024(Sat-Sun) Weekend Batch
View Details
Big Data Course 06 Apr 2024(Sat-Sun) Weekend Batch
View Details
Big Data Course 13 Apr 2024(Sat-Sun) Weekend Batch
View Details