0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

Is there any difference in semantics between df.na().drop() and df.filter(df.col("onlyColumnInOneColumnDataFrame").isNotNull() && 

!df.col("onlyColumnInOneColumnDataFrame").isNaN()) where df is Apache Spark Dataframe?

1 Answer

0 votes
by (31.4k points)

With df.na.drop() you actually drop the rows containing any null or NaN values.

And With df.filter(df.col("onlyColumnInOneColumnDataFrame").isNotNull()) you drop those rows which have null only in the column onlyColumnInOneColumnDataFrame.

 

In order to achieve the same thing with df.na.drop() , you can do:

 df.na.drop(["onlyColumnInOneColumnDataFrame"])

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...