Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

Spark standalone cluster looks it's running without a problem :

I followed one tutorial.

I have built a fat jar for running this JavaApp on the cluster. Before maven package:

find .

./pom.xml
./src
./src/main
./src/main/java
./src/main/java/SimpleApp.java


content of SimpleApp.java is :

 import org.apache.spark.api.java.*;
 import org.apache.spark.api.java.function.Function;
 import org.apache.spark.SparkConf;
 import org.apache.spark.SparkContext;


 public class SimpleApp {
 public static void main(String[] args) {

 SparkConf conf =  new SparkConf()
                   .setMaster("spark://10.35.23.13:7077")
                   .setAppName("My app")
                   .set("spark.executor.memory", "1g");

 JavaSparkContext   sc = new JavaSparkContext (conf);
 String logFile = "/home/ubuntu/spark-0.9.1/test_data";
 JavaRDD<String> logData = sc.textFile(logFile).cache();

 long numAs = logData.filter(new Function<String, Boolean>() {
  public Boolean call(String s) { return s.contains("a"); }
 }).count();

 System.out.println("Lines with a: " + numAs);
 }
 }


This program only works when master is set as setMaster("local"). Otherwise I get this error

$java -cp path_to_file/simple-project-1.0-allinone.jar SimpleApp

1 Answer

0 votes
by (32.3k points)

An anonymous class (that extends Function) in present in SimpleApp.java file. This class is compiled to SimpleApp$1, which should be broadcasted to each worker in the Spark cluster.

In order to do this add the jar explicitly to the Spark context. Simply add something like sparkContext.addJar("path_to_file/simple-project-1.0-allinone.jar") after JavaSparkContext is created and just rebuild your jar file. Then the main Spark program that is also known as the driver program will automatically deliver your application code to the cluster.

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...