0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

Right now, I have to use df.count > 0 to check if the DataFrame  is empty or not. But it is kind of inefficient. Is there any better way to do that.

PS: I want to check if it's empty so that I only save the DataFrame if it's not empty

1 Answer

0 votes
by (32.5k points)
edited by

I recently came across one such scenario. The following are some of the ways to check if a dataframe is empty.

  • df.count() == 0

  • df.head().isEmpty

  • df.rdd.isEmpty

  • df.first().isEmpty

If a dataframe carries a full record, you should avoid using count(), as it is inefficient. However there might be some situations where you are very certain that the dataframe would have either a single row or no record at all, in that case you should go for count().

For an ideal case I would suggest you to check the head element if it is empty or not.


If you want to know more about Spark, then do check out this awesome video tutorial:

If you wish to learn What is Apache Spark visit this Apache Spark Training by Intellipaat.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !