Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I've downloaded the prebuild version of spark 1.4.0 without hadoop (with user-provided Haddop). When I ran the spark-shell command, I got this error:

> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/
FSDataInputStream
        at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSpa
rkProperties$1.apply(SparkSubmitArguments.scala:111)
        at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSpa
rkProperties$1.apply(SparkSubmitArguments.scala:111)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkPropert
ies(SparkSubmitArguments.scala:111)
        at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArgume
nts.scala:97)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:106)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStr
eam
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 7 more

 

I've searched on Internet, it is said that HADOOP_HOME has not been set yet in spark-env.cmd. But I cannot find spark-env.cmd in the spark installation folder. I've traced the spark-shell command and it seems that there are no HADOOP_CONFIG in there. I've tried to add the HADOOP_HOME on environment variable but it still give the same exception.


 

1 Answer

0 votes
by (32.3k points)
edited by

The "without Hadoop" in the Spark's build name is misleading: it shows that the build is not tied to a specific Hadoop distribution, not that it is meant to run without it: the user should indicate where to find Hadoop (see https://spark.apache.org/docs/latest/hadoop-provided.html)

One clean way to fix this issue is to:

  • Obtain Hadoop Windows binaries. Ideally build them, but this isn't an easy task. Otherwise Google some up, for instance currently you can download 2.6.0 from here: http://www.barik.net/archive/2015/01/19/172716/
  • Just, create a spark-env.cmd file(modify Hadoop path to match your installation): 

         @echo off

          set HADOOP_HOME=D:\Utils\hadoop-2.7.1

          set PATH=%HADOOP_HOME%\bin;%PATH%

          set SPARK_DIST_CLASSPATH=<paste here the output of

          %HADOOP_HOME%\bin\hadoop classpath>

  • Then, put this spark-env.cmd either in a conf folder located at the similar level as your Spark base folder (which may look weird), or in a folder indicated by the SPARK_CONF_DIR environment variable.

If you want to know more about Spark, then do check out this awesome video tutorial:

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...