Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

Is there any difference in semantics between df.na().drop() and df.filter(df.col("onlyColumnInOneColumnDataFrame").isNotNull() && 

!df.col("onlyColumnInOneColumnDataFrame").isNaN()) where df is Apache Spark Dataframe?

1 Answer

0 votes
by (32.3k points)

With df.na.drop() you actually drop the rows containing any null or NaN values.

And With df.filter(df.col("onlyColumnInOneColumnDataFrame").isNotNull()) you drop those rows which have null only in the column onlyColumnInOneColumnDataFrame.

 

In order to achieve the same thing with df.na.drop() , you can do:

 df.na.drop(["onlyColumnInOneColumnDataFrame"])

Browse Categories

...