Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I am using Spark 1.4 for my research and struggling with the memory settings. My machine has 16GB of memory so no problem there since the size of my file is only 300MB. Although, when I try to convert Spark RDD to panda dataframe using toPandas() function I receive the following error:

serialized results of 9 tasks (1096.9 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
 

I tried to fix this changing the spark-config file and still getting the same error. I've heard that this is a problem with spark 1.4 and wondering if you know how to solve this. Any help is much appreciated.

1 Answer

0 votes
by (32.3k points)
edited by

You may set spark.driver.maxResultSize parameter in the SparkConf object:

from pyspark import SparkConf, SparkContext

///Stop the current context first

sc.stop()

///Then, Create new config

conf = (SparkConf()

    .set("spark.driver.maxResultSize", "2g"))

///create a new SQLContext as well:

from pyspark.sql import SQLContext

sqlContext = SQLContext(sc)

///Create new context

sc = SparkContext(conf=conf)

If you want to know more about Spark, then do check out this awesome video tutorial:

Browse Categories

...