Installation of MapReduce

By Abhijit | Last updated on April 25, 2025 | 87826 Views

Installing and Getting Started with MapReduce

MapReduce Tutorial supports only the Linux based OS, and it comes default with a Hadoop framework. So, we need to perform following steps to install the Hadoop framework.
We have to install Java first in our system, before installing Hadoop. So using the below command we have to check whether Java is installed in our system or not.

<br>
$ java -version<br>
If Java is installed already, then we can see the below output.<br>
java version “1.7.1_71-b13”<br>
java(TM) SE Runtime Environment (build 1.7.1_71-b13)<br>
java HotSpot(TM) Client VM (build 25.0-b02,mixed mode)<br>

Else, follow the below steps for installing Java.
Step 1: Download the latest version Java software from the below link.

<http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html><br>

Step 2: To extract the file, use below commands

<br>
$ cd Downloads/<br>
$ls<br>
Jdk-7u79-linux-x64.gz<br>
$ tar zxf jdk-7u79-linux-x64.gz<br>
$ ls<br>
Jdk1.0.71 jdk-7u79-linux-x64.gz<br>

Step 3 : Move the location to “/usr/local/”. Because Java should be available to all the users. Go to the root, and use below commands.

<br>
$ su<br>
Password :<br>
#mv jdk1.7.0_71 /usr/local/java<br>
#exit<br>

Step 4: For setting up the PATH and JAVA_HOME variables, use the commands in ~/.bashrc file.

export JAVA_HOME=/usr/local/java<br>
export PATH=$PATH:$JAVA_HOME/bin<br>

Step 5: To configure the Java alternatives.

<br>
# alternatives - -install /usr/bin/java java usr/local/java/bin/java 2<br>
# alternatives - -install /usr/bin/javac javac usr/local/java/bin/javac 2<br>
# alternatives - -install /usr/bin/jar jar usr/local/java/bin/jar 2<br>
# alternatives - -set java usr/local/java/bin/java<br>
# alternatives - -set java usr/local/java/bin/java c<br>
# alternatives - -set jar usr/local/java/bin/jar<br>

Verifying the Hadoop installation

Before installing the MapReduce we should install Hadoop in our system.
To verify the Hadoop installation,

$ hadoop version

If Hadoop is installed in your system, then we will get this below output.
Hadoop 2.4.1

<br>
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768<br>
Compiled by hortonmu on 2014-10-07T06:28Z<br>
Compiled with protoc 2.5.0<br>
From source with checksum 79e53ce7994d162b240f09af91e1af4<br>

If Hadoop is not installed on your system, use the below steps to download.

Downloading Hadoop

Download the software 2.4.1 from Apache software and using the below extract the files.

$ su<br>
password:<br>
# cd /usr/local<br>
# wget http://apache.claz.org/hadoop/common/hadoop-2.4.1/<br>
hadoop-2.4.1.tar.gz<br>
# tar xzf hadoop-2.4.1.tar.gz<br>
# mv hadoop-2.4.1/* to hadoop/# exit<br>

To install Hadoop in pseudo distributed mode, follow the below steps
Step 1 : To set Hadoop environment variable use the commands in ~/.bashrc file.

export HADOOP_HOME=/usr/local/hadoopexport HADOOP_MAPRED_HOME=$HADOOP_HOMEexport HADOOP_COMMON_HOME=$HADOOP_HOMEexport HADOOP_HDFS_HOME=$HADOOP_HOMEexport YARN_HOME=$HADOOP_HOMEexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin<br>

Step 2 : $ cd $ HADOOP_HOME /etc/Hadoop use this command to Hadoop configuration.

export  JAVA_HOME=/usr/local/java

To configure Hadoop, we have to edit the following files

Core-site.xml
Hdfs-site.xml
Yarn-site.xml
Mapred-site.xml

Core-site.xml: It contains the below information.

The Port number used for Hadoop instance
File system memory allocation
The data stored in limited memory
Read/Write buffer size

Hdfs – site contains below information.

Replicated data value
Path of namenode
Local file system (FS) datanode path

Yarn- site .xml :It is used to configure yarn into Hadoop by adding the properties below .

<br>
<configuration><br>
<property><br>
<name><br>
yarn.nodemanager.aux-services<br>
</name><br>
<value>MapReduce_shuffle</value><br>
</property><br>
</configuration><br>

Mapred-site.xml : It is used to specify the MapReduce framework.

Hadoop setup verification

Step 1 : setup of name node
Use below command for setup of namenode

$ cd ~$ hdfs namenode -format

Step 2 : Hadoop dfs

$ start-dfs.sh

Step 3: Yarn Script

$ start-yarn.sh

Step 4 : Using the default port number 50070 we can access Hadoop on the browser.

http://localhost:50070/

Step 5 : Using default port number 8088 we can access all the applications of a cluster.

http://localhost:8088/

About the Author

Abhijit

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.