HDFS Installation and Shell Commands

Setting up of the Hadoop cluster:

Here you will learn how to successfully install Hadoop and configure the clusters which could range from just a couple of nodes to even tens of thousands over huge clusters. So for that, first you need to install Hadoop on a single machine. The requirement for that is you need to install Java if you don’t have it already on your system.

Getting Hadoop to work on the entire cluster involves getting the required software on all the machines that are tied to the cluster. As per the norms, one of the machines is associated with the Name Node and another is associated with the Resource Manager.

Watch this video on Hadoop before going further on this Hadoop tutorial.

Video Thumbnail

The other services like The MapReduce Job History and the Web App Proxy Server can be hosted on specific machines or even on shared resources as per the requirement of the task or load.  All the other nodes in the entire cluster with have the dual nature of being both the Node Manager and the Data Node. These are collectively termed as the slave nodes.

Getting Hadoop to work in the non-secure mode

The Java configuration of Hadoop has two important files:

  • Read-only default configuration -core-default.xml, hdfs-default.xml, yarn-default.xml and mapred-default.xml.
  • Site-specific configuration -etc/hadoop/core-site.xml, etc/hadoop/hdfs-site.xml, etc/hadoop/yarn-site.xml and etc/hadoop/mapred-site.xml.

It is possible to manage the Hadoop scripts in the bin/ directory of the distribution, by setting site-specific values via the etc/hadoop/hadoop-env.sh and etc/hadoop/yarn-env.sh.

For the Hadoop cluster configuration you first need to create the ecosystem in which the Hadoop daemons can execute and also the needed parameters for configuration.

The various daemons of Hadoop Distributed File System are listed below:

  • NodeManager
  • ResourceManager
  • WebAppProxy
  • NameNode
  • SecondaryNameNode
  • DataNode
  • YARN daemons

Certification in Bigdata Analytics

The Hadoop Daemons configuration environment

To get the Hadoop daemons’ the right site-specific customization the administrators need to use the etc/hadoop/hadoop-env.sh or the etc/hadoop/mapred-env.sh and etc/hadoop/yarn-env.sh scripts. The JAVA_HOME should be specified appropriately so that it is defined in the right manner on every remote node.

Configuration of the individual daemons

The list of Daemons along with the relevant environment variable

NameNode –HADOOP_NAMENODE_OPTS

DataNode – HADOOP_DATANODE_OPTS

Secondary NameNode – HADOOP_SECONDARYNAMENODE_OPTS

ResourceManager – YARN_RESOURCEMANAGER_OPTS

NodeManager – YARN_NODEMANAGER_OPTS

WebAppProxy – YARN_PROXYSERVER_OPTS

Map Reduce Job History Server – HADOOP_JOB_HISTORYSERVER_OPTS

Customization of other important configuration parameters:

  • HADOOP_PID_DIR – the process ID files of the daemons is contained in this directory.
  • HADOOP_LOG_DIR – the log files of the daemons are stored in this directory.
  • HADOOP_HEAPSIZE YARN_HEAPSIZE – the heapsize is measured in MB and if you have the variable that is set to 1000 then automatically the heap is also set to 1000 MB. By default it is set to 1000.

Become a Big Data Architect

The HDFS Shell Commands

Learn about the most important operations of Hadoop Distributed File System using the shell commands that are used for file management in the cluster.

  • Directory creation in HDFS for a specific given path.
hadoopfs-mkdir<paths>

Example:

hadoopfs-mkdir/user/saurzcode/dir1/user/saurzcode/dir2
  •  Listing of the directory contents.
hadoopfs-ls<args>

Example:

hadoopfs-ls/user/saurzcode
  • HDFS file Upload/download.

Upload:

hadoopfs -put:

Copy single src file, or multiple src files from local file system to the Hadoop data file system

hadoopfs-put<localsrc> ... <HDFS_dest_Path>

Example:

hadoopfs-put/home/saurzcode/Samplefile.txt/user/saurzcode/dir3/

Download:

hadoopfs -get:

Copies/Downloads files to the local file system

hadoopfs-get<hdfs_src><localdst>

Example:

hadoopfs-get/user/saurzcode/dir3/Samplefile.txt/home/
  • Viewing of file content

Same as unix cat command:

hadoopfs-cat<path[filename]>

Example:

hadoopfs-cat/user/saurzcode/dir1/abc.txt
  • File copying from source to destination
hadoopfs-cp<source><dest>

Example:

hadoopfs-cp/user/saurzcode/dir1/abc.txt/user/saurzcode/dir2

  • Copying of file to HDFS from a local file and vice-versa

copyFromLocal

hadoopfs-copyFromLocal<localsrc>URI

Example:

hadoopfs-copyFromLocal/home/saurzcode/abc.txt/user/saurzcode/abc.txt

copyToLocal

Usage:

hadoopfs-copyToLocal [-ignorecrc] [-crc] URI<localdst>

  • File moving from source to destination.

But remember, you cannot move files across filesystem.

hadoopfs-mv<src><dest>

Example:

hadoopfs-mv/user/saurzcode/dir1/abc.txt/user/saurzcode/dir2
  • File or directory removal in HDFS.
hadoopfs-rm<arg>

Example:

hadoopfs-rm/user/saurzcode/dir1/abc.txt

Repetitive version of delete.

hadoopfs-rmr<arg>

Example:

hadoopfs-rmr/user/saurzcode/
  • Showing the file’s final few lines.
hadoopfs-tail<path[filename]>

Example:

hadoopfs-tail/user/saurzcode/dir1/abc.txt

Learn new Technologies

  • Showing the aggregate length of a file.
hadoopfs-du<path>

Example:

hadoopfs-du/user/saurzcode/dir1/abc.txt

Our Big Data Courses Duration and Fees

Program Name
Start Date
Fees
Cohort starts on 18th Jan 2025
₹22,743
Cohort starts on 8th Feb 2025
₹22,743
Cohort starts on 1st Feb 2025
₹22,743

About the Author

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.