Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

Right now, I have to use df.count > 0 to check if the DataFrame  is empty or not. But it is kind of inefficient. Is there any better way to do that.

PS: I want to check if it's empty so that I only save the DataFrame if it's not empty

1 Answer

0 votes
by (32.3k points)
edited by

I recently came across one such scenario. The following are some of the ways to check if a dataframe is empty.

  • df.count() == 0

  • df.head().isEmpty

  • df.rdd.isEmpty

  • df.first().isEmpty

If a dataframe carries a full record, you should avoid using count(), as it is inefficient. However there might be some situations where you are very certain that the dataframe would have either a single row or no record at all, in that case you should go for count().

For an ideal case I would suggest you to check the head element if it is empty or not.

df.head(1).isEmpty

If you want to know more about Spark, then do check out this awesome video tutorial:

If you wish to learn What is Apache Spark visit this Apache Spark Training by Intellipaat.

Browse Categories

...