Installation of MapReduce

Installing and Getting Started with MapReduce

MapReduce Tutorial supports only the Linux based OS, and it comes default with a Hadoop framework. So, we need to perform following steps to install the Hadoop framework.
We have to install Java first in our system, before installing Hadoop. So using the below command we have to check whether Java is installed in our system or not.

$ java -version
If Java is installed already, then we can see the below output.
java version  “1.7.1_71-b13”
java(TM) SE Runtime Environment (build 1.7.1_71-b13)
java HotSpot(TM) Client VM (build 25.0-b02,mixed mode)

Here is a Mapreduce Tutorial Video by Intellipaat

Video Thumbnail

Else, follow the below steps for installing Java.
Step 1: Download the latest version Java  software from the below link.
<http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html>

Step 2: To extract the file, use below commands
$ cd Downloads/
$ls
Jdk-7u79-linux-x64.gz
$ tar zxf jdk-7u79-linux-x64.gz
$ ls
Jdk1.0.71 jdk-7u79-linux-x64.gz

Step 3 : Move   the location to “/usr/local/”. Because Java should be available to all the users. Go to the root, and use below commands.
$ su
Password :
#mv jdk1.7.0_71 /usr/local/java
#exit

Step 4: For setting up the PATH  and JAVA_HOME  variables, use the commands in  ~/.bashrc file.
export  JAVA_HOME=/usr/local/java
export PATH=$PATH:$JAVA_HOME/bin

Step 5: To configure the Java alternatives.
#  alternatives  - -install  /usr/bin/java  java  usr/local/java/bin/java  2
#  alternatives  - -install  /usr/bin/javac  javac  usr/local/java/bin/javac  2
#  alternatives  - -install  /usr/bin/jar  jar  usr/local/java/bin/jar   2
#  alternatives  - -set  java usr/local/java/bin/java
#  alternatives  - -set  java usr/local/java/bin/java c
#  alternatives  - -set  jar  usr/local/java/bin/jar

Verifying the Hadoop installation

Before installing the MapReduce we should install Hadoop in our system.
To verify the Hadoop installation,

$ hadoop version

If Hadoop is installed in your system, then we will get this below output.
Hadoop 2.4.1
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2014-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d162b240f09af91e1af4

If Hadoop is not installed on your system, use the below steps to download.

Certification in Bigdata Analytics

Downloading Hadoop

Download the software 2.4.1 from Apache software and using the below extract the files.

$ su
password:
# cd /usr/local
# wget http://apache.claz.org/hadoop/common/hadoop-2.4.1/
hadoop-2.4.1.tar.gz
# tar xzf hadoop-2.4.1.tar.gz
# mv hadoop-2.4.1/* to hadoop/# exit

To install Hadoop in pseudo distributed mode, follow the below steps
Step 1 : To set Hadoop environment variable use the commands in  ~/.bashrc file.
export HADOOP_HOME=/usr/local/hadoopexport HADOOP_MAPRED_HOME=$HADOOP_HOMEexport HADOOP_COMMON_HOME=$HADOOP_HOMEexport HADOOP_HDFS_HOME=$HADOOP_HOMEexport YARN_HOME=$HADOOP_HOMEexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Step 2 : $ cd $ HADOOP_HOME /etc/Hadoop use this command  to Hadoop configuration.
export  JAVA_HOME=/usr/local/java

To configure Hadoop, we have to edit the following files

  • Core-site.xml
  • Hdfs-site.xml
  • Yarn-site.xml
  • Mapred-site.xml

Core-site.xml: It contains the below information.

  • The Port number used for Hadoop instance
  • File system memory allocation
  • The data stored in limited memory
  • Read/Write buffer size

Hdfs – site contains below information.

  • Replicated data value
  • Path of namenode
  • Local file system (FS) datanode path

Yarn- site .xml :It is used to configure yarn into Hadoop by adding the properties below .

<configuration>
   <property>
      <name>
           yarn.nodemanager.aux-services
      </name>
      <value>MapReduce_shuffle</value>
   </property>
</configuration>

Mapred-site.xml : It is used to  specify the MapReduce framework.

Hadoop  setup  verification

Step 1 :  setup of name node
Use below command for setup of namenode

$ cd ~$ hdfs namenode -format

Step 2 : Hadoop  dfs
$ start-dfs.sh

Step 3: Yarn Script
$ start-yarn.sh

Step 4 : Using the default port number 50070 we can access Hadoop on  the browser.
http://localhost:50070/

Step 5 : Using default port number 8088 we can access all the applications of a cluster.
http://localhost:8088/

Our Big Data Courses Duration and Fees

Program Name
Start Date
Fees
Cohort starts on 18th Jan 2025
₹22,743
Cohort starts on 25th Jan 2025
₹22,743
Cohort starts on 18th Jan 2025
₹22,743

About the Author

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.