Currently I am running my program as
val conf = new SparkConf()
.setAppName("Test Data Analysis")
.setMaster("local[*]")
.set("spark.executor.memory", "32g")
.set("spark.driver.memory", "32g")
.set("spark.driver.maxResultSize", "4g")
Even though I am running on a cluster of 5 machines (each with 376 GB Physical RAM). my program errors out with java.lang.OutOfMemoryError: Java heap space
My data sizes are big... but not so big that they exceed 32 GB Executor memory * 5 nodes.
I suspect it may be because I am using "local" as my master. I have seen documentation say use spark://machinename:7070
However I want to know for my cluster... how do I determine this URL and port.
In my case the spark cluster was setup/maintained by someone else and so I don't want to change topology by starting my own master.