Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)

I was trying to follow the Spark standalone application example described here

The example ran fine with the following invocation:

spark-submit  --class "SimpleApp" --master local[4] target/scala-2.10/simple-project_2.10-1.0.jar

However, when I tried to introduce some third-party libraries via --jars, it throws ClassNotFoundException.

$ spark-submit --jars /home/linpengt/workspace/scala-learn/spark-analysis/target/pack/lib/* \
  --class "SimpleApp" --master local[4] target/scala-2.10/simple-project_2.10-1.0.jar

Spark assembly has been built with Hive, including Datanucleus jars on classpath
Exception in thread "main" java.lang.ClassNotFoundException: SimpleApp
    at Method)
    at java.lang.ClassLoader.loadClass(
    at java.lang.ClassLoader.loadClass(
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:300)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Removing the --jars option and the program runs again (I didn't actually start using those libraries yet). What's the problem here? How should I add the external jars?

1 Answer

0 votes
by (32.3k points)

According to spark-submit's --help, the --jars option expects a comma-separated list of local jars to include on the driver and executor classpaths.

I think that what's happening here is that 

/home/linpengt/workspace/scala-learn/spark-analysis/target/pack/lib/*  is actually expanding into a space-separated list of jars and the second JAR in the list is being treated as the application jar.

One solution is to use your shell to build a comma-separated list of jars; here's a quick way of doing it in bash: 

spark-submit --jars $(echo /dir/of/jars/*.jar | tr ' ' ',') \

    --class "SimpleApp" --master local[4] path/to/myApp.jar

Browse Categories