Running Apache Hadoop 2.1.0 on Windows

Question

asked Jul 10, 2019 in Big Data Hadoop & Spark by Aarav (11.4k points)

I am new to Hadoop and have run into problems trying to run it on my Windows 7 machine. Particularly I am interested in running Hadoop 2.1.0 as its release notes mention that running on Windows is supported. I know that I can try to run 1.x versions on Windows with Cygwin or even use prepared VM by for example Cloudera, but these options are in some reasons less convenient for me.
After some research I found that there really are some *.cmd scripts that can be run without Cygwin. Everything worked fine when I formated HDFS partition but when I tried to run hdfs namenode daemon I faced two errors: first, non fatal, was that winutils.exe could not be found (it really wasn't present in the tarball downloaded). I found the sources of this component in the Apache Hadoop sources tree and compiled it with Microsoft SDK and MSbuild. Thanks to detailed error message it was clear where to put the executable to satisfy Hadoop. But the second error which is fatal doesn't contain enough information for me to solve:

13/09/05 10:20:09 FATAL namenode.NameNode: Exception in namenode join
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:423)
    at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:952)
    at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:451)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:282)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:200)
...
13/09/05 10:20:09 INFO util.ExitUtil: Exiting with status 1

Looks like something else should be compiled. I'm going to try to build Hadoop from the source with Maven but isn't there a simpler way? Isn't there some option-I-know-not-of that can disable native code and make that tarball usable on Windows?

1 Answer

Amit Rawat · Answer 1 · 2019-07-10T04:25:36+0000

I have followed following steps to install Hadoop 2.2.0

Steps to build Hadoop bin distribution for Windows:

Download and install Microsoft Windows SDK v7.1.

Download and install Unix command-line tool Cygwin.

Download and install Maven 3.1.1.

Download Protocol Buffers 2.5.0 and extract to a folder (say c:\protobuf).

Add Environment Variables JAVA_HOME, M2_HOME and Platform if not added already. Note : Variable name Platform is case sensitive. And value will be either x64 or Win32 for building on a 64-bit or 32-bit system. Edit Path Variable to add bin directory of Cygwin (say C:\cygwin64\bin), bin directory of Maven (say C:\maven\bin) and installation path of Protocol Buffers (say c:\protobuf).

Download hadoop-2.2.0-src.tar.gz and extract to a folder having short path (say c:\hdfs) to avoid runtime problem due to maximum path length limitation in Windows.

Select Start --> All Programs --> Microsoft Windows SDK v7.1 and open Windows SDK 7.1 Command Prompt. Change directory to Hadoop source code folder (c:\hdfs). Execute mvn package with options -Pdist,native-win -DskipTests -Dtar to create Windows binary tar distribution.

If everything goes well in the previous step, then native distribution hadoop-2.2.0.tar.gz will be created inside C:\hdfs\hadoop-dist\target\hadoop-2.2.0 directory.

Now, you just need to install Hadoop and configure it.

Extract hadoop-2.2.0.tar.gz to a folder (say c:\hadoop).

Add Environment Variable HADOOP_HOME and edit Path Variable to add bin directory of HADOOP_HOME (say C:\hadoop\bin).

Configure these Hadoop files:

C:\hadoop\etc\hadoop\core-site.xml

C:\hadoop\etc\hadoop\hdfs-site.xml

C:\hadoop\etc\hadoop\mapred-site.xml

C:\hadoop\etc\hadoop\ yarn-site.xml

Now, Format namenode, Start HDFS (Namenode and Datanode), Start MapReduce aka YARN (Resource Manager and Node Manager).

Total four separate Command Prompt windows will be opened automatically to run Namenode, Datanode, Resource Manager, Node Manager

Reference: http://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-os

If you want to know more about Hadoop, then do check out this awesome video tutorial:

Running Apache Hadoop 2.1.0 on Windows

1 Answer

Related questions

Browse Categories