Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

In the SparkSQL 1.6 API (scala) Dataframe has functions for intersect and except, but not one for difference. Obviously, a combination of union and except can be used to generate difference:

df1.except(df2).union(df2.except(df1))


But this seems a bit awkward. In my experience, if something seems awkward, there's a better way to do it, especially in Scala.

1 Answer

0 votes
by (32.3k points)

Try to rewrite it as:

df1.unionAll(df2).except(df1.intersect(df2))

UNION, INTERSECT and EXCEPT / MINUS is pretty much a standard set of SQL combining operators. And I am not aware of any system which provides XOR like operation out of the box.  I can say that it is very insignificant to implement using other three operators and there isn’t much to optimize.

...