0 votes
1 view
in AWS by (19.2k points)

I have set up a multi-node Hadoop Cluster. The NameNode and Secondary namenode runs on the same machine and the cluster has only one Datanode. All the nodes are configured on Amazon EC2 machines.

Following are the configuration files on the master node:

masters

54.68.218.192 (public IP of the master node)

slaves

54.68.169.62 (public IP of the slave node)

core-site.xml

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

mapred-site.xml

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/usr/local/hadoop_store/hdfs/namenode</value>

</property>

<property>

<name>dfs.datanode.name.dir</name>

<value>file:/usr/local/hadoop_store/hdfs/datanode</value>

</property>

</configuration>

Now are the configuration files on the datanode:

core-site.xml

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://54.68.218.192:10001</value>

</property>

</configuration>

mapred-site.xml

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>54.68.218.192:10002</value>

</property>

</configuration>

hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/usr/local/hadoop_store/hdfs/namenode</value>

</property>

<property>

<name>dfs.datanode.name.dir</name>

<value>file:/usr/local/hadoop_store/hdfs/datanode</value>

</property>

</configuration>

the jps run on the Namenode give the following:

5696 NameNode

6504 Jps

5905 SecondaryNameNode

6040 ResourceManager

and jps on datanode:

2883 DataNode

3496 Jps

3381 NodeManager

which to me seems right.

Now when I try to run a put command:

hadoop fs -put count_inputfile /test/input/

It gives me the following error:

put: File /count_inputfile._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.

The logs on the datanode says the following:

hadoop-datanode log

INFO org.apache.hadoop.ipc.Client: Retrying connect to server:      54.68.218.192/54.68.218.192:10001. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

yarn-nodemanager log:

INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

The web UI of node manager(50070) shows that there are 0 live nodes and 0 dead nodes and the dfs used is 100%

I have also disabled IPV6.

On a few websites, I found out that I should also edit the /etc/hosts file. I have also edited them and they look like this:

127.0.0.1 localhost

172.31.25.151 ip-172-31-25-151.us-west-2.compute.internal

172.31.25.152 ip-172-31-25-152.us-west-2.compute.internal

Why I am still getting the error?

1 Answer

0 votes
by (44.6k points)

Follow these steps,

STEP 1: stop Hadoop and clean the temp files from hduser

sudo rm -R /tmp/*

Also, there might be a need to delete and recreate. (mostly when changing from 2.2.0 to 2.7.0)

sudo rm -r /app/hadoop/tmp

sudo mkdir -p /app/hadoop/tmp

sudo chown hduser:hadoop /app/hadoop/tmp

sudo chmod 750 /app/hadoop/tmp

STEP 2: format the namenode

hdfs namenode -format

Now, I can see DataNode

[email protected]:~$ jps

19135 NameNode

20497 Jps

19477 DataNode

20447 NodeManager

19902 SecondaryNameNode

20106 ResourceManager

Related questions

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...