Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I am using Spark 1.3.0 with python api. While transforming huge dataframes, I cache many DFs for faster execution;

df1.cache()
df2.cache()


Once use of certain dataframe is over and is no longer needed how can I drop DF from memory (or un-cache it??)?

For example, df1 is used through out the code while df2 is utilized for few transformations and after that, it is never needed. I want to forcefully drop df2 to release more memory space.

1 Answer

0 votes
by (32.3k points)

Actually, Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion.

But If you would like to manually remove an RDD instead of waiting for it to fall out of the cache, use the RDD.unpersist() method.

Use the following lines of code:

df1.unpersist()

df2.unpersist()

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...