Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
+1 vote
2 views
in Big Data Hadoop & Spark by (19k points)

I have recently started using Hadoop 2.9.2 and faced a problem while running this command

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar grep input output 'dfs[a-z.]+'

Here, the basic idea was to use grep on the ‘jar’ file. It will extract matching strings and tell the no. of times they have occurred. But it is not showing me the result in the output/part-r-00000.txt file, which is meant to show it, the result is empty.

When I run      bin/hdfs namenode -format     it shows me an error

ERROR common.Util: Syntax error in URI file://path to temp_directory/dfs/name. Please check hdfs configuration. java.net.URISyntaxException: Illegal character in authority at index 7: file://path to temp_directory/dfs/name

It looks like the data node is not working. How do I solve this problem and know if my datanode is running up to the mark?

1 Answer

0 votes
by (32.3k points)
edited by

If you are not getting the result on the  output/part-r-00000.txt file then:

Check your datanode’s configuration file hdfs-site.xml and make changes in it accordingly

You have to keep in mind that it detects the directories that you want to fetch. Be sure about the permissions and the value in dfs.datanode.data.dir parameter.

Go to etc/hadoop (inside Hadoop directory), there you will find your hdfs-site.xml file then set your dfs.datanode.data.dir as required according to your requirements

For, my Linux system following is the hadoop hdfs-site.xml file -

If still, It doesn’t work do one thing, Try to run these commands for making your daemons run and start your HDFS, if by any means it is not able to work this will hopefully start your namenode. 

sudo service hadoop-hdfs-namenode start ;

sudo service hadoop-hdfs-datanode start ;

sudo service hadoop-hdfs-secondarynamenode start ;

Find your jps directory or set alias jps

alias jps='/usr/lib/jvm/jdk(version you found)/bin/jps’

Now run jps and see your datanode is working

Sudo jps

If it still doesn’t work just reinstall hadoop and verify its working :

Step 0: To Reset Cluster

              sudo reset_cluster.sh

Step 1: Format the NameNode

              sudo -u hdfs hdfs namenode -format

Step 2: Start HDFS

       sudo service hadoop-hdfs-namenode start ;

       sudo service hadoop-hdfs-datanode start ;

       sudo service hadoop-hdfs-secondrynamenode start ;

#   Command to check the hadoop services 

               sudo /usr/java/latest/bin/jps

               

Step 3: Create the /tmp Directory

#   Create a new /tmp directory and set permissions: 

                sudo -u hdfs hadoop fs -mkdir /tmp

      sudo -u hdfs hadoop fs -chmod -R 1777 /

Step 4: Create Staging and Log Directories

                sudo -u hdfs hadoop fs -mkdir /tmp/hadoop-yarn/staging

                sudo -u hdfs hadoop fs -chmod -R 1777 /tmp/hadoop-yarn/staging

#  Create the done_intermediate directory under the staging directory and set permissions:

                 sudo -u hdfs hadoop fs -mkdir /tmp/hadoop-yarn/staging/history/done_intermediate

       sudo -u hdfs hadoop fs -chmod -R 1777               /tmp/hadoop-yarn/staging/history/done_intermediate

#  Change ownership on the staging directory and subdirectory:

                  sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging

#  Create the /var/log/hadoop-yarn directory and set ownership:

                  sudo -u hdfs hadoop fs -mkdir /var/log/hadoop-yarn

       sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn 

 Step 5: Verify the HDFS File Structure:

                   sudo -u hdfs hadoop fs -ls -R /

You should see some directory structure ( basically of YARN)

Step 6: Start YARN

               sudo service hadoop-yarn-resourcemanager start

      sudo service hadoop-yarn-nodemanager start

      sudo service hadoop-mapreduce-historyserver start

Step 7: Create User Directories basically a home directory each MapReduce user. It is best to do this on the NameNode. Running an example application with YARN

                 sudo -u hdfs hadoop fs -mkdir /user/training

       sudo -u hdfs hadoop fs -chown training /user/training

Make a directory in HDFS called input and copy some XML files into it by running the following commands in pseudo-distributed mode:

                  hadoop fs -mkdir input

       hadoop fs -put /etc/hadoop/conf/*.xml input

       hadoop fs -ls input

Then, you will find 4 items 

        input/core-site.xml , input/hdfs-site.xml , input/mapred-site.xml , input/yarn-site.xml

Now, set HADOOP_MAPRED_HOME :

                   export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

Run your hadoop example now,

                   hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 'dfs[a-z.]+'

Now, you can read the results in the output file output/part-r-00000.txt .

You can also refer to the following video if you want more information regarding the same: 

...