• Articles
  • Tutorials
  • Interview Questions

How to Install Hadoop on Windows and Linux - Step by Step Guide

Hadoop Installation

In this section of the Hadoop tutorial, we will be talking about the Hadoop installation process.

Hadoop is supported by the Linux platform and its facilities. If you are working on Windows, you can use Cloudera VMware that has preinstalled Hadoop, or you can use Oracle VirtualBox or the VMware Workstation. In this tutorial, I will be demonstrating the installation process for Hadoop using the VMware Workstation 12. You can use any of the above to perform the installation. I will do this by installing CentOS on my VMware.

Hadoop Installation Prerequisites

  • VirtualBox/VMWare/Cloudera: Any of these can be used for installing the operating system.
  • Operating System: You can do Hadoop installation on Linux-based operating systems. Ubuntu and CentOS are very commonly used among them. In this Hadoop installation tutorial, we are using CentOS.
  • Java: You need to install the Java 8 package on your system.
  • Hadoop: You require the Hadoop 2.7.3 package.

Watch this Hadoop Video before getting started with this tutorial!

Video Thumbnail

Let’s start off to see how to install Hadoop in this Hadoop installation tutorial

Hadoop Installation on Windows

Note: If you are working on Linux, then skip to Step 9.

Step 1: Installing VMware Workstation

  • Download VMware Workstation from this link
  • Once downloaded, open the .exe file and set the location as required
  • Follow the required steps of installation

Step 2: Installing CentOS

  • Install CentOS from this link
  • Save the file in any desired location

Step 3: Setting up CentOS in VMware 12

When you open VMware, the following window pops up:

Click on Create a New Virtual Machine

1. As seen in the screenshot above, browse the location of your CentOS file you downloaded. Note that it should be a disc image file

2. Click on Next

1. Choose the name of your machine. Here, I have given the name CentOS 64-bit

2. Then, click Next

1. Specify the disk capacity. Here, I have specified it to be 20 GB

2. Click Next

    • Click on Finish
  • After this, you should be able to see a window as shown below. This screen indicates that you are booting the system and getting it ready for installation. You will be given a time of 60 seconds to change the option from Install CentOS to others. You will need to wait for 60 seconds if you need the option selected to be Install CentOS

Note: In the image above, you can see three options, such as, I Finished Installing, Change Disc, and Help. You don’t need to touch any of these until your CentOS is successfully installed.

    • At the moment, your system is being checked and is getting ready for installation
  • Once the checking percentage reaches 100%, you will be taken to a screen as shown below:

Step 4: Here, you can choose your language. The default language is English, and that is what I have selected

1. If you want any other language to be selected, specify it
2. Click on Continue

Step 5: Setting up the Installation Processes

    • From Step 4, you will be directed to a window with various options as shown below:
    • First, to select the software type, click on the SOFTWARE SELECTION option
      • Now, you will see the following window:1. Select the Server with GUI option to give your server a graphical appeal
        2. Click on Done
    • After clicking on Done, you will be taken to the main menu where you had previously selected SOFTWARE SELECTION
  • Next, you need to click on INSTALLATION DESTINATION

  • On clicking this, you will see the following window:1. Under Other Storage Options, select I would like to make additional space available
    2. Then, select the radio button that says I will configure partitioning
    3. Then, click on Done

    • Next, you’ll be taken to another window as shown below:1. Select the partition scheme here as Standard Partition2. Now, you need to add three mount points here. For doing that, click on ‘+’

a) Select the Mount Point /boot as shown above
b) Next, select the Desired Capacity as 500 MiB as shown below:

c) Click on Add mount point
d) Again, click on ‘+’ to add another Mount Point

e) This time, select the Mount Point as swap and Desired Capacity as 2 GiB

f) Click on Add Mount Point
g) Now, to add the last Mount Point, click on + again

h) Add another Mount Point ‘/’ and click on Add Mount Point

i) Click on Done, and you will see the following window:

Note: This is just to make you aware of all the changes you had made in the partition of your drive

      • Now, click on Accept Changes if you’re sure about the partitions you have made
      • Next, select NETWORK & HOST NAME
      • You’ll be taken to a window as shown below:
        1. Set the Ethernet settings as ON
        2. Change the HOST name if required
        3. Apply the settings
        4. Finally, click on Done
    • Next, click on Begin Installation

      Step 6: Configuration

    • Once you complete step 5, you will see the following window where the final installation process will be completed.
    • But before that, you need to set the ROOT PASSWORD and create a user
    • Click on ROOT PASSWORD, which will direct you to the following window:
      1. Enter your root password here
      2. Confirm the password
      3. Click on Done

      • Now, click on USER CREATION, and you will be directed to the following window:
        1. Enter your Full name. Here, I have entered Intellipaaat
        2. Next, enter your User name; here, intellipaaat (This generally comes up automatically)
        3. You can either make this password-based or make this a user                                  administrator
        4. Enter the password
        5. Confirm your password
        6. Finally, click on Done
    • You’ll see the Reboot button, as seen below when your installation is done, which takes up to 20–30 minutes
    • In the next screen, you will see the installation process in progress

      Note: It will take about 3 seconds for the CentOS to start.
    • Wait until a window pops up to accept your license info step 7: Setting up the License Information
    • Accept the License InformationStep 8: Logging into CentOS
  • You will see the login screen as below:

Enter the user ID and password you had set up in Step 6

Your CentOS installation is now complete! Now, you need to start working on CentOS, and not on your local operating system. If you have jumped to this step because you are already working on Linux/Ubuntu, then continue with the following steps.

Note: All commands need to be run on the Terminal. You can open the Terminal by right-clicking on the desktop and selecting Open Terminal

Step 9: Downloading and Installing Java 8

  • Click here to download the Java 8 Package. Save this file in your home directory
  • Extract the Java tar file using the following command:
tar -xvf jdk-8u101-linux-i586.tar.gz

Step 10: Downloading and Installing Hadoop

    • Download a stable release packed as a zipped file from here and unpack it somewhere on your file system
  • Extract the Hadoop file using the following command on the terminal:
tar -xvf hadoop-2.7.3.tar.gz
    • You will be directed to the following window:
      Step 11: Moving Hadoop to a Location
    • Use the following code to move your file to a particular location, here Hadoop:
      mv hadoop-2.7.3/home/intellipaaat/hadoop

      Note: The location of the file you want to change may differ. For demonstration purposes, I have used this location, and this will be the same throughout this tutorial. You can change it according to your choice.

  • Here, Home will remain the same.
  • Intellipaat is the user name I have used. You can change it according to your user name.
  • Hadoop is the location where I want to save this file. You can change it as well if you want.

Certification in Bigdata Analytics

Step 12: Editing and Setting up HadoopFirst, you need to set the path in the ~/.bashrc file. You can set the path from the root user by using the command ~/.bashrc. Before you edit ~/.bashrc, you need to check your Java configurations.

Enter the command:

update-alternatives-config java

You will now see all the Java versions available on the machine. Here, since I have only one version of Java which is the latest one, it is shown below:

You can have multiple versions as well.

  • Next, you need to select the version you want to work on. As you can see, there is a highlighted path in the above screenshot. Copy this path and place it in a gedit file. This path is just for being used in the upcoming steps
  • Enter the selection number you have chosen. Here, I have chosen the number 1
  • Now, open ~/.bashrc with the vi editor (the screen-oriented text editor in Linux)

Note: You have to become a root user first to be able to edit ~/.bashrc.

    • Enter the command: su
    • You will be prompted for the password. Enter your root password
    • When you get logged into your root user, enter the command: vi ~/.bashrc
    • The above command takes you to the vi editor, and you should be able to see the following screen:
    • To access this, press Insert on your keyboard, and then, start writing the following set of codes for setting a path for Java:
      fi
      #HADOOP VARIABLES START
      export JAVA_HOME= (path you copied in the previous step)
      export HADOOP_HOME=/home/(your username)/hadoop
      export PATH=$PATH:$HADOOP_INSTALL/bin
      export PATH=$PATH:$HADOOP_INSTALL/sbin
      export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
      export HADOOP_COMMON_HOME=$HADOOP_INSTALL
      export HADOOP_HDFS_HOME=$HADOOP_INSTALL
      export YARN_HOME=$HADOOP_INSTALL
      export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_/INSTALL/lib/native
      export HADOOP_OPTS=”Djava.library.path”=$HADOOP_INSTALL/lib”
      #HADOOP VARIABLES END

      After writing the code, click on Esc on your keyboard and write the command:wq!
      This will save and exit you from the vi editor. The path has been set now as it can be seen in the image below:

      Step 13: Adding Configuration Files

    • Open hadoop-env.sh with the vi editor using the following command:
      vi /home/intellipaaat/hadoop/etc/hadoop/hadoop-env.sh

    • Replace this path with the Java path to tell Hadoop which path to use. You will see the following window coming up:
    • Change the JAVA_HOME variable to the path you had copied in the previous step
      Step 14:

      Now, several XML files need to be edited, and you need to set the property and the path for them.
    • Editing core-site.xml
        • Use the same command as in the previous step and just change the last part to core-site.xml as given below:
          vi /home/intellipaaat/hadoop/etc/hadoop/core-site.xml

          Next, you will see the following window:

        • Enter the following code in between the configuration tags as below:
          <configuration>
              <property>
                  <name>fs.defaultFS</name>
                  <value>hdfs://(your localhost):9000</value>
              </property>
          </configuration>
          

      • Now, exit from this window by entering the command:wq!
    • Editing yarn-site.xml
        • Enter the command:
          vi /home/intellipaaat/hadoop/etc/hadoop/yarn-site.xml

          You will see the following window:

      • Enter the code in between the configuration tags as shown below:
        <configuration>
        <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        </property>
        <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        </configuration>

      • Exit from this window by pressing Esc and then writing the command:wq!
    • Editing mapred-site.xml
        • Copy or rename a file mapred-site.xml.template with the name mapred-site.xml.Note: If you go to the following path, you will see that there is no file named mapred-site.xml:
          Home > intellipaaat > hadoop > hadoop-2.7.3 > etc > hadoop
          So, we will copy the contents of the mapred-site .xml.template to mapred-site.xml.
        • Use the following command to copy the contents:
          cp /home/intellipaaat/hadoop/hadoop-2.7.3/etc/hadoop/ mapred-site.xml.template /home/intellipaaat/hadoop/hadoop-2.7.3/etc/hadoop/ mapred-site.xml

          Once the contents have been copied to a new file named mapred-site.xml, you can verify it by going to the following path:
          Home > intellipaaat > hadoop > hadoop-2.7.3 > etc > hadoop

        • Now, use the following command to add configurations:
          vi/home/intellipaaat/hadoop/etc/hadoop/mapred-site.xml

      • In the new window, enter the following code in between the configuration tags as below:
        <configuration>
        <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        </property>
        </configuration>

      • Exit using Esc and the command:wq!
  • Editing hdfs-site.xml

    Before editing the hdfs-site.xml, two directories have to be created, which will contain the namenode and the datanode.

      • Enter the following code for creating a directory, namenode:
        mkdir -p /home/intellipaaat/hadoop_store/hdfs/namenode


        Note:
        Here, mkdir means creating a new file.

      • Similarly, to create the datanode directory, enter the following command:
        mkdir -p /home/intellipaaat/hadoop_store/hdfs/datanode

      • Now, go to the following path to check both the files:
        Home > intellipaaat > hadoop_store > hdfs You can find both directories in the specified path as in the images below:
    • Now, to configure hdfs-site.xml, use the following command:
      vi /home/intellipaaat/hadoop/etc/hadoop/hdfs-site.xml

    • Enter the following code in between the configuration tags:
      <configuration>
      <property>
      <name>dfs.replication</name>
      <value>1</value>
      </property>
      <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/home/intellipaaat/hadoop_store/hdfs/namenode</value>
      </property>
      <property>
      <name>dfs.datanode.data.dir</name>
      <value> file:/home/intellipaaat/hadoop_store/hdfs/namenode</value>
      </property>
      </configuration>

    • Exit using Esc and the command:wq!

That’s all!
All your configurations are done. And Hadoop Installation is done now!


Step 15: Checking Hadoop

You will now need to check whether the Hadoop installation is successfully done on your system or not.

  • Go to the location where you had extracted the Hadoop tar file, right-click on the bin, and open it in the terminal
  • Now, write the command, ls
    Next, if you see a window as below, then it means that Hadoop is successfully installed!

Become a Big Data Architect

Course Schedule

Name Date Details
Big Data Course 14 Dec 2024(Sat-Sun) Weekend Batch View Details
21 Dec 2024(Sat-Sun) Weekend Batch
28 Dec 2024(Sat-Sun) Weekend Batch

About the Author

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.