Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I'm having problems with a "ClassNotFound" Exception using this simple example:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

import java.net.URLClassLoader

import scala.util.Marshal

class ClassToRoundTrip(val id: Int) extends scala.Serializable {
}

object RoundTripTester {

  def test(id : Int) : ClassToRoundTrip = {

    // Get the current classpath and output. Can we see simpleapp jar?
    val cl = ClassLoader.getSystemClassLoader
    val urls = cl.asInstanceOf[URLClassLoader].getURLs
    urls.foreach(url => println("Executor classpath is:" + url.getFile))

    // Simply instantiating an instance of object and using it works fine.
    val testObj = new ClassToRoundTrip(id)
    println("testObj.id: " + testObj.id)

    val testObjBytes = Marshal.dump(testObj)
    val testObjRoundTrip = Marshal.load[ClassToRoundTrip](testObjBytes)  // <<-- ClassNotFoundException here
    testObjRoundTrip
  }
}

object SimpleApp {
  def main(args: Array[String]) {

    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)

    val cl = ClassLoader.getSystemClassLoader
    val urls = cl.asInstanceOf[URLClassLoader].getURLs
    urls.foreach(url => println("Driver classpath is: " + url.getFile))

    val data = Array(1, 2, 3, 4, 5)
    val distData = sc.parallelize(data)
    distData.foreach(x=> RoundTripTester.test(x))
  }
}


In local mode, submitting as per the docs generates a "ClassNotFound" exception on line 31, where the ClassToRoundTrip object is deserialized. Strangely, the earlier use on line 28 is okay:

spark-submit --class "SimpleApp" \
             --master local[4] \
             target/scala-2.10/simpleapp_2.10-1.0.jar

 

However, if I add extra parameters for "driver-class-path", and "-jars", it works fine, on local.

spark-submit --class "SimpleApp" \
             --master local[4] \
             --driver-class-path /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \
             --jars /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/SimpleApp.jar \
             target/scala-2.10/simpleapp_2.10-1.0.jar

 

However, submitting to a local dev master, still generates the same issue:

spark-submit --class "SimpleApp" \
             --master spark://localhost.localdomain:7077 \
             --driver-class-path /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \
             --jars /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \
             target/scala-2.10/simpleapp_2.10-1.0.jar

1 Answer

0 votes
by (32.3k points)

I had this same issue. If master is local then program runs fine for most people. If they set it to "spark://myurl:7077" it doesn't work. Most of the you get such an error because an anonymous class was not found during execution. It is resolved by using SparkContext.addJars("Path to jar").

Make sure you are doing following things: -

  • SparkContext.addJars("Path to jar created from maven [hint: mvn package]").

  • I have used SparkConf.setMaster("spark://myurl:7077") in code and have supplied same has argument while submitting job to spark via command line.

  • When you specify class in command line, make sure your are writing it's complete name with URL. eg: "packageName.ClassName"

  • Final command should look like this bin/spark-submit --class "packageName.ClassName" --master spark://myurl:7077 pathToYourJar/target/yourJarFromMaven.jar

I would suggest you to go through this blog, this will help you out in more details.

...