0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

Currently I am running my program as

val conf = new SparkConf()
  .setAppName("Test Data Analysis")
  .set("spark.executor.memory", "32g")
  .set("spark.driver.memory", "32g")
  .set("spark.driver.maxResultSize", "4g")

Even though I am running on a cluster of 5 machines (each with 376 GB Physical RAM). my program errors out with java.lang.OutOfMemoryError: Java heap space

My data sizes are big... but not so big that they exceed 32 GB Executor memory * 5 nodes.

I suspect it may be because I am using "local" as my master. I have seen documentation say use spark://machinename:7070

However I want to know for my cluster... how do I determine this URL and port.

In my case the spark cluster was setup/maintained by someone else and so I don't want to change topology by starting my own master.

1 Answer

0 votes
by (32.5k points)

You can use the below command to get the URL information:


Also, if you've already set up a spark cluster on top of your physical cluster.Just check http://master:8088 where master is pointing to spark master machine. There you will be able to see spark master URI, and by default is spark://master:7077, actually quite a bit of information lives there, if you have a spark standalone cluster.

However, I see a lot of questions on SO claiming this does not work with many different reasons. Using the spark-submit utility will be a less error prone process, See usage.

But if you haven't got a spark cluster yet I will suggest you to set up the Spark Standalone cluster first.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !