0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

In the SparkSQL 1.6 API (scala) Dataframe has functions for intersect and except, but not one for difference. Obviously, a combination of union and except can be used to generate difference:

df1.except(df2).union(df2.except(df1))


But this seems a bit awkward. In my experience, if something seems awkward, there's a better way to do it, especially in Scala.

1 Answer

0 votes
by (31.4k points)

Try to rewrite it as:

df1.unionAll(df2).except(df1.intersect(df2))

UNION, INTERSECT and EXCEPT / MINUS is pretty much a standard set of SQL combining operators. And I am not aware of any system which provides XOR like operation out of the box.  I can say that it is very insignificant to implement using other three operators and there isn’t much to optimize.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...