Is there any difference in semantics between and df.filter(df.col("onlyColumnInOneColumnDataFrame").isNotNull() && 

!df.col("onlyColumnInOneColumnDataFrame").isNaN()) where df is Apache Spark Dataframe?

1 Answer

With you actually drop the rows containing any null or NaN values.

And With df.filter(df.col("onlyColumnInOneColumnDataFrame").isNotNull()) you drop those rows which have null only in the column onlyColumnInOneColumnDataFrame.


In order to achieve the same thing with , you can do:["onlyColumnInOneColumnDataFrame"])

