bing
Flat 10% & upto 50% off + Free additional Courses. Hurry up!

Hadoop Installation

 

Hadoop is supported by Linux platform and its flavors. So you have to install a Linux OS for setting up Hadoop environment. If you have another operating system than Linux then in this case you can install virtual machine and have Linux inside the virtual machine.

 

Prerequisites

Hadoop is written in Java, so there is a need of Java installed on the machine and version should be 1.6 or later.

 

Installation

It is easy to run Hadoop on a single machine using your own user account. From the http://www.eu.apache.org/dist/hadoop/common/, download a stable release, which is packaged as a gzipped tar file and then unpack it somewhere on your filesystem:

% tar xzf hadoop-x.y.z.tar.gz

Before run Hadoop location is needed where java is installed.

If Java has been installed, this should display the version details as illustrated in the following image:

installation1

 

You can set the Java installation that Hadoop uses by editing conf/hadoop-env.sh and specifying the JAVA_HOME variable. For example on Mac you changed the line to read:

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.7/Home/

installation2
On Ubuntu use:

export JAVA_HOME=/usr/lib/jvm/java-7-sun

 It is easy to produce an environment variable that is used to point to the Hadoop installation directory say HADOOP_INSTALL and to put the Hadoop binary directory on command-line path. In Hadoop 2.0 It is require to set the sbin directory on the path also. For example:

% export HADOOP_INSTALL=/home/tom/hadoop-x.y.z

% export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin

 Check that Hadoop runs by typing:

% hadoop version

Hadoop 1.0.0

Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0

-r 1214675

Compiled by hortonfo on Thu Dec 15 16:36:35 UTC 2011

 

Configuration

By using an XML file every component in Hadoop is configure. MapReduce properties are found in mapred-site.xml, common properties are found in core-site.xml and HDFS properties are found in hdfs-site.xml and. These files are placed in the conf subdirectory.

In Hadoop 2.0 and afterward MapReduce runs on YARN and there is also a configuration file called yarn-site.xml. Every configuration files must go in etc/ hadoop subdirectory. Hadoop can be run in one of the three modes:

  • Fully distributed mode – The Hadoop daemons run on a machine’s cluster.
  • Standalone or local mode –  There are no daemons running and all runs in a single JVM. It is appropriate for running MapReduce programs throughout development and it is simple to test and debug them.
  • Pseudodistributed mode – The Hadoop daemons run on the local machine so simulating a cluster on a small scale.

 

To run Hadoop in a particular mode you need to do two things:

  • Set the appropriate properties
  • Start the Hadoop daemons

Below Diagram demonstrates the least set of properties to configure every mode. In standalone mode the local file system and the local MapReduce job runner are used while in the distributed modes MapReduce or YARN daemons and the HDFS are started.

key configuration properties for different modes

 

 

Standalone Mode

In this mode there is no additional action to perform and the default properties are set for standalone mode and there are no daemons to run.

 

Pseudodistributed Mode

The configuration files should be created with the following contents and placed in the conf directory (although you can place configuration files in any directory as long as you start the daemons with the –config option):

core-site.xml File

<?xml version="1.0"?>

<!-- core-site.xml -->

<configuration>

<property>

<value>hdfs://localhost/</value>

</property>

</configuration>

core-site.xml file

 

 

hdfs-site.xml File

 <?xml version="1.0"?>

<!-- hdfs-site.xml -->

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

hdfs-site.xml file

 

mapred-site.xml File

 <?xml version="1.0"?>

<!-- mapred-site.xml -->

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>localhost:8021</value>

</property>

</configuration>

mapred-site.xml file

 

 

If you are running YARN, use the yarn-site.xml file:

<?xml version="1.0"?>

<!-- yarn-site.xml -->

<configuration>

<property>

<name>yarn.resourcemanager.address</name>

<value>localhost:8032</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce.shuffle</value>

</property>

</configuration>

yarn-site.xml file

 

Configuring SSH

In pseudodistributed mode you have to start daemons and for this reason you require to have SSH installed. It simply starts daemons on the set of hosts in the cluster which is defined by the slaves file by SSH-ing to every host and starting a daemon process.

Pseudodistributed mode is a special case of fully distributed mode in which the host is localhost so you need to ensures that you can SSH to localhost and log in without enter a password. First ensures that SSH is installed and a server is running. On Ubuntu this is achieved by using:-

% sudo apt-get install ssh

 Then to enable password-less login generate a new SSH key with an empty passphrase:

% ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

% cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

 Test this with:

% ssh localhost

 If successful you should not have to type in a password.

 

Formatting the HDFS Filesystem

The formatting process makes an empty filesystem by creating the storage directories and the primary versions of the namenode’s persistent data structures. Datanodes are not involved in the initial formatting process since the namenode is used to manage all of the filesystem’s metadata and datanodes can leave or join the cluster dynamically.

Formatting HDFS is a fast operation. Just type the following:

% hadoop namenode –format

 

Starting and stopping the daemons (MapReduce 1)

To start MapReduce daemons and the HDFS, type:

% start-dfs.sh

% start-mapred.sh

 The following daemons will be started on our local machine: a namenode, a a datanode, a jobtracker, Secondary namenode, and a tasktracker. You can check whether the daemons started successfully by looking at the logfiles in the logs directory (in the Hadoop installation directory) or by looking at the web UIs, at http://localhost:50030/ for the jobtracker and at http://localhost:50070/ for the namenode. You can also use Java’s jps command to see whether they are running.

Stopping the daemons is done in the obvious way:

% stop-dfs.sh

% stop-mapred.sh

 

Starting and stopping the daemons (MapReduce 2)

To start the HDFS and YARN daemons, type:

% start-dfs.sh

% start-yarn.sh

 These commands will start the HDFS daemons, and for YARN, a node manager and a resource manager. The resource manager web UI is at http://localhost:8088/.

You can stop the daemons with:

% stop-dfs.sh

% stop-yarn.sh

"0 Responses on Hadoop Installation"

Training in Cities

Bangalore, Hyderabad, Chennai, Delhi, Kolkata, UK, London, Chicago, San Francisco, Dallas, Washington, New York, Orlando, Boston

100% Secure Payments. All major credit & debit cards accepted Or Pay by Paypal.

top

Sales Offer

  • To avail this offer, enroll before 04th December 2016.
  • This offer cannot be combined with any other offer.
  • This offer is valid on selected courses only.
  • Please use coupon codes mentioned below to avail the offer
offer-june

Sign Up or Login to view the Free Hadoop Installation.