bing
Flat 10% & upto 50% off + Free additional Courses. Hurry up!

MapReduce Integration

 

One of the great features of HBase is its tight integration with Hadoop’s MapReduce framework.

 

7.1 Framework    

7.1.1 MapReduce Introduction

MapReduce as a process was designed to solve the problem of processing in excess of terabytes of data in a scalable way. There should be a way to build such a system that increases in performance linearly with the number of physical machines added. That is what MapReduce strives to do. It follows a divide-and-conquer approach by splitting the data located on a distributed filesystem so that the servers (or rather CPUs, or more modern “cores”) available can access these chunks of data and process them as fast as they can. The problem with this approach is that you will have to consolidate the data at the end. Again, MapReduce has this built right into it.

 

the mapreduce process

 

7.1.2 Classes

Map reduce process figure  also shows you the classes that are involved in the Hadoop implementation of MapReduce.

 

  • InputFormat

First it splits the input data, and then it returns a RecordReader instance that defines the classes of the key and value objects, and provides a next() method that is used to iterate over each input record.

 

the inputformat hierarchy

  • Mapper

In this step, each record read using the RecordReader is processed using the map() method.

 

the mapper hierarchy

  • Reducer

The Reducer stage and class hierarchy is very similar to the Mapper stage. This time we get the output of a Mapper class and process it after the data has been shuffled and sorted.

 

reducer

 

  • OutputFormat

The final stage is the OutputFormat class, and its job is to persist the data in various locations. There are specific implementations that allow output to files, or to HBase tables in the case of the TableOutputFormat class. It uses a TableRecord Writer to write the data into the specific HBase output table.

 

outputformat

 

7.1.3 Supporting Classes

The MapReduce support comes with the TableMapReduceUtil class that helps in setting up MapReduce jobs over HBase. It has static methods that configure a job so that you can run it with HBase as the source and/or the target.

 

7.2 MapReduce over HBase    

7.2.1 Preparation

To run a MapReduce job that needs classes from libraries not shipped with Hadoop or the MapReduce framework, you’ll need to make those libraries available before the job is executed. You have two choices: static preparation of all task nodes, or supplying everything needed with the job.

 

  • Static Provisioning

For a library that is used often, it is useful to permanently install its JAR file(s) locally on the task tracker machines, that is, those machines that run the MapReduce tasks. This is done by doing the following:

  1. Copy the JAR files into a common location on all nodes.
  2. Add the JAR files with full location into the hadoop-env.sh configuration file, into the HADOOP_CLASSPATH variable:
# Extra Java CLASSPATH elements. Optional.

# export HADOOP_CLASSPATH="<extra_entries>:$HADOOP_CLASSPATH"
  1. Restart all task trackers for the changes to be effective.

Obviously this technique is quite static, and every update (e.g., to add new libraries) requires a restart of the task tracker daemons.

 

  • Dynamic Provisioning

In case you need to provide different libraries to each job you want to run, or you want to update the library versions along with your job classes, then using the dynamic provisioning approach is more useful.

 

7.2.2 Data Source and Sink

The source or target of a MapReduce job can be a HBase table, but it is also possible for a job to use HBase as both input and output. In other words, a third kind of MapReduce template uses a table for the input and output types. This involves setting the TableInputFormat and TableOutputFormat classes into the respective fields of the job configuration.

This blog will help you get a better understanding of Hbase!

"0 Responses on MapReduce Integration"

Training in Cities

Bangalore, Hyderabad, Chennai, Delhi, Kolkata, UK, London, Chicago, San Francisco, Dallas, Washington, New York, Orlando, Boston

100% Secure Payments. All major credit & debit cards accepted Or Pay by Paypal.

top

Sales Offer

  • To avail this offer, enroll before 07th December 2016.
  • This offer cannot be combined with any other offer.
  • This offer is valid on selected courses only.
  • Please use coupon codes mentioned below to avail the offer
offer-june

Sign Up or Login to view the Free MapReduce Integration.