Implementation Of Mapreduce

First Program in MapReduce

The following table shows the data about customer visited the Intellipaat.com page. The table includes the monthly visitors of intellipaat.com page and annual average of five years.

	JAN	FEB	MAR	APR	MAY	JUN	JULY	AUG	SEP	OCT	NOV	DEC	AVG
2008	23	23	2	43	24	25	26	26	26	25	26	26	25
2009	26	27	28	28	28	30	31	31	31	30	30	30	29
2010	31	32	32	32	33	34	35	36	36	34	34	34	34
2014	39	38	39	39	39	41	42	43	40	39	39	38	40
2016	38	39	39	39	39	41	41	41	00	40	40	39	45

To find the maximum number of visitors and minimum number of visitors in the year we used MapReduce framework.

Here is a Mapreduce Tutorial Video by Intellipaat

Input data: The above data is saved as intellipaat.txt and this is used as an input data.

Example program of MapReduce framework

<em>package hadoop;</em><br>
<em>import java.util.*;</em><br>
<em>import java.io.IOException;</em><br>
<em>import java.io.IOException;</em><br>
<em>import org.apache.hadoop.fs.Path;</em><br>
<em>import org.apache.hadoop.conf.*;</em><br>
<em>import org.apache.hadoop.io.*;</em><br>
<em>import org.apache.hadoop.mapred.*;</em><br>
<em>import org.apache.hadoop.util.*;</em><br>
<em>public class Intellipaat_visitors</em><br>
<em>{</em><br>
<em> //Mapper class</em><br>
<em> public static class E_EMapper extends MapReduceBase implements</em><br>
<em> Mapper<LongWritable, /*Input key Type */</em><br>
<em> Text, /*Input value Type*/</em><br>
<em> Text, /*Output key Type*/</em><br>
<em> IntWritable> /*Output value Type*/</em><br>
<em> {</em><br>
<em> //Map function</em><br>
<em> public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException</em><br>
<em> {</em><br>
<em> String line = value.toString();</em><br>
<em> String lasttoken = null;</em><br>
<em> StringTokenizer s = new StringTokenizer(line,"t");</em><br>
<em> String year = s.nextToken();</em><br>
<em> while(s.hasMoreTokens()){</em><br>
<em> lasttoken=s.nextToken();</em><br>
<em> }</em><br>
<em> int avgprice = Integer.parseInt(lasttoken);</em><br>
<em> output.collect(new Text(year), new IntWritable(avgprice));</em><br>
<em> }</em><br>
<em> }</em>

<em>//Reducer class</em><br>
<em> public static class E_EReduce extends MapReduceBase implements</em><br>
<em> Reducer< Text, IntWritable, Text, IntWritable ></em><br>
<em> {</em><br>
<em> //Reduce function</em><br>
<em> public void reduce(Text key, Iterator <IntWritable> values, OutputCollector>Text, IntWritable> output, Reporter reporter) throws IOException</em><br>
<em> {</em><br>
<em> int maxavg=30;</em><br>
<em> int val=Integer.MIN_VALUE;</em><br>
<em> while (values.hasNext())</em><br>
<em> {</em><br>
<em> if((val=values.next().get())>maxavg)</em><br>
<em> {</em><br>
<em> output.collect(key, new IntWritable(val));</em><br>
<em> }</em><br>
<em> }</em><br>
<em> }</em><br>
<em> }</em><br>
<em> //Main function</em><br>
<em> public static void main(String args[])throws Exception</em><br>
<em> {</em><br>
<em> JobConf conf = new JobConf(Visitors.class);</em><br>
<em> conf.setJobName("max_visitors");</em><br>
<em> conf.setOutputKeyClass(Text.class);</em><br>
<em> conf.setOutputValueClass(IntWritable.class);</em><br>
<em> conf.setMapperClass(E_EMapper.class);</em><br>
<em> conf.setCombinerClass(E_EReduce.class);</em><br>
<em> conf.setReducerClass(E_EReduce.class);</em><br>
<em> conf.setInputFormat(TextInputFormat.class);</em><br>
<em> conf.setOutputFormat(TextOutputFormat.class);</em><br>
<em> FileInputFormat.setInputPaths(conf, new Path(args[0]));</em><br>
<em> FileOutputFormat.setOutputPath(conf, new Path(args[1]));</em><br>
<em> JobClient.runJob(conf);</em><br>
<em> }</em><br>
<em>}<br>
</em><br>

Save the above program by the name Intellipaat_visitors.java
Store the compiled Java classes in new directory. Use the below command to create a new directory.

<em>$ mkdir visitors</em><br>

Using the below link to download the jar

<em>http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core/1.2.1</em><br>

Compile the Intellipaat_visitors and create jar for the program.

<em>$ javac -classpath hadoop-core-1.2.1.jar -d visitors Intellipaat_visitors.java</em><br>
<em>$ jar -cvf visitors.jar -C visitors/ </em><br>
Create an  input directory in HDFS using below command<br>
<em>$HADOOP_HOME/bin/hadoop fs -mkdir input_dir</em><br>

Copy the input file named Intellipaat_visitors.txt in the input directory of HDFS.

<em>$HADOOP_HOME/bin/hadoop fs -put /home/hadoop/Intellipaat_visitors.txt input_dir</em><br>
<em>$HADOOP_HOME/bin/hadoop jar visitors.jar hadoop.Intellipaat_visitors input_dir output_dir</em><br>
<strong>Output</strong><br>
INFO mapreduce.Job: Job job_1414748220717_0002<br>
completed successfully<br>
14/10/31 06:02:52<br>
INFO mapreduce.Job: Counters: 49<br>

File System Counters

<em>   </em><br>
<em>   FILE: Number of bytes read=61</em><br>
<em>   FILE: Number of bytes written=279400</em><br>
<em>   FILE: Number of read operations=0</em><br>
<em>   FILE: Number of large read operations=0</em><br>
<em>   FILE: Number of write operations=0</em><br>
<em> </em><br>
<em>   HDFS: Number of bytes read=546</em><br>
<em>   HDFS: Number of bytes written=40</em><br>
<em>   HDFS: Number of read operations=9</em><br>
<em>   HDFS: Number of large read operations=0</em><br>
<em>   HDFS: Number of write operations=2 Job Counters</em><br>
<em>   </em><br>
<em>   Launched map tasks=2</em><br>
<em>   Launched reduce tasks=1</em><br>
<em>   Data-local map tasks=2</em><br>
<em>         </em><br>
<em>   Total time spent by all maps in occupied slots (ms)=146137</em><br>
<em>   Total time spent by all reduces in occupied slots (ms)=441</em><br>
<em>   Total time spent by all map tasks (ms)=14613</em><br>
<em>   Total time spent by all reduce tasks (ms)=44120</em><br>
<em>         </em><br>
<em>   Total vcore-seconds taken by all map tasks=146137</em><br>
<em>   Total vcore-seconds taken by all reduce tasks=44120</em><br>
<em>         </em><br>
<em>   Total megabyte-seconds taken by all map tasks=149644288</em><br>
<em>   Total megabyte-seconds taken by all reduce tasks=45178880</em><br>
<em> </em><br>

Map-Reduce Framework

<em>   </em><br>
<em>   Map input records=5</em><br>
<em>         </em><br>
<em>   Map output records=5</em><br>
<em>   Map output bytes=45</em><br>
<em>   Map output materialized bytes=67</em><br>
<em>         </em><br>
<em>   Input split bytes=208</em><br>
<em>   Combine input records=5</em><br>
<em>   Combine output records=5</em><br>
<em>         </em><br>
<em>   Reduce input groups=5</em><br>
<em>   Reduce shuffle bytes=6</em><br>
<em>   Reduce input records=5</em><br>
<em>   Reduce output records=5</em><br>
<em>         </em><br>
<em>   Spilled Records=10</em><br>
<em>   Shuffled Maps =2</em><br>
<em>   Failed Shuffles=0</em><br>
<em>   Merged Map outputs=2</em><br>
<em>   GC time elapsed (ms)=948</em><br>
<em>   CPU time spent (ms)=5160</em><br>
<em> </em><em>  Physical memory (bytes) snapshot=47749120</em><br>
<em>   Virtual memory (bytes) snapshot=2899349504</em><br>
<em>   </em><em>Total committed heap usage (bytes)=277684224</em><br>
<em> </em><em>File Output Format Counters</em><br>
<em> </em><em>   Bytes Written=40</em>

Using the below command verified the resultant in the output folder

<em>$HADOOP_HOME/bin/hadoop fs -ls output_dir/</em>

The final output of mapreduce framework is

2010	34
2014	40
2016	45

Implementation Of Mapreduce

First Program in MapReduce

Here is a Mapreduce Tutorial Video by Intellipaat

About the Author