Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)

Right now, I have to use df.count > 0 to check if the DataFrame  is empty or not. But it is kind of inefficient. Is there any better way to do that.

PS: I want to check if it's empty so that I only save the DataFrame if it's not empty

1 Answer

0 votes
by (32.3k points)
edited by

I recently came across one such scenario. The following are some of the ways to check if a dataframe is empty.

  • df.count() == 0

  • df.head().isEmpty

  • df.rdd.isEmpty

  • df.first().isEmpty

If a dataframe carries a full record, you should avoid using count(), as it is inefficient. However there might be some situations where you are very certain that the dataframe would have either a single row or no record at all, in that case you should go for count().

For an ideal case I would suggest you to check the head element if it is empty or not.


If you want to know more about Spark, then do check out this awesome video tutorial:

If you wish to learn What is Apache Spark visit this Apache Spark Training by Intellipaat.

Browse Categories