Installing and Getting Started with MapReduce
MapReduce Tutorial supports only the Linux based OS, and it comes default with a Hadoop framework. So, we need to perform following steps to install the Hadoop framework.
We have to install Java first in our system, before installing Hadoop. So using the below command we have to check whether Java is installed in our system or not.
$ java -version
If Java is installed already, then we can see the below output.
java version “1.7.1_71-b13”
java(TM) SE Runtime Environment (build 1.7.1_71-b13)
java HotSpot(TM) Client VM (build 25.0-b02,mixed mode)
Here is a Mapreduce Tutorial Video by Intellipaat
Else, follow the below steps for installing Java.
Step 1: Download the latest version Java software from the below link.
<http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html>
Step 2: To extract the file, use below commands
$ cd Downloads/
$ls
Jdk-7u79-linux-x64.gz
$ tar zxf jdk-7u79-linux-x64.gz
$ ls
Jdk1.0.71 jdk-7u79-linux-x64.gz
Step 3 : Move the location to “/usr/local/”. Because Java should be available to all the users. Go to the root, and use below commands.
$ su
Password :
#mv jdk1.7.0_71 /usr/local/java
#exit
Step 4: For setting up the PATH and JAVA_HOME variables, use the commands in ~/.bashrc file.
export JAVA_HOME=/usr/local/java
export PATH=$PATH:$JAVA_HOME/bin
Step 5: To configure the Java alternatives.
# alternatives - -install /usr/bin/java java usr/local/java/bin/java 2
# alternatives - -install /usr/bin/javac javac usr/local/java/bin/javac 2
# alternatives - -install /usr/bin/jar jar usr/local/java/bin/jar 2
# alternatives - -set java usr/local/java/bin/java
# alternatives - -set java usr/local/java/bin/java c
# alternatives - -set jar usr/local/java/bin/jar
Verifying the Hadoop installation
Before installing the MapReduce we should install Hadoop in our system.
To verify the Hadoop installation,
$ hadoop version
If Hadoop is installed in your system, then we will get this below output.
Hadoop 2.4.1
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2014-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d162b240f09af91e1af4
If Hadoop is not installed on your system, use the below steps to download.
Downloading Hadoop
Download the software 2.4.1 from Apache software and using the below extract the files.
$ su
password:
# cd /usr/local
# wget http://apache.claz.org/hadoop/common/hadoop-2.4.1/
hadoop-2.4.1.tar.gz
# tar xzf hadoop-2.4.1.tar.gz
# mv hadoop-2.4.1/* to hadoop/# exit
To install Hadoop in pseudo distributed mode, follow the below steps
Step 1 : To set Hadoop environment variable use the commands in ~/.bashrc file.
export HADOOP_HOME=/usr/local/hadoopexport HADOOP_MAPRED_HOME=$HADOOP_HOMEexport HADOOP_COMMON_HOME=$HADOOP_HOMEexport HADOOP_HDFS_HOME=$HADOOP_HOMEexport YARN_HOME=$HADOOP_HOMEexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
Step 2 : $ cd $ HADOOP_HOME /etc/Hadoop use this command to Hadoop configuration.
export JAVA_HOME=/usr/local/java
To configure Hadoop, we have to edit the following files
- Core-site.xml
- Hdfs-site.xml
- Yarn-site.xml
- Mapred-site.xml
Core-site.xml: It contains the below information.
- The Port number used for Hadoop instance
- File system memory allocation
- The data stored in limited memory
- Read/Write buffer size
Hdfs – site contains below information.
- Replicated data value
- Path of namenode
- Local file system (FS) datanode path
Yarn- site .xml :It is used to configure yarn into Hadoop by adding the properties below .
<configuration>
<property>
<name>
yarn.nodemanager.aux-services
</name>
<value>MapReduce_shuffle</value>
</property>
</configuration>
Mapred-site.xml : It is used to specify the MapReduce framework.
Hadoop setup verification
Step 1 : setup of name node
Use below command for setup of namenode
$ cd ~$ hdfs namenode -format
Step 2 : Hadoop dfs
$ start-dfs.sh
Step 3: Yarn Script
$ start-yarn.sh
Step 4 : Using the default port number 50070 we can access Hadoop on the browser.
http://localhost:50070/
Step 5 : Using default port number 8088 we can access all the applications of a cluster.
http://localhost:8088/