In the SparkSQL 1.6 API (scala) Dataframe has functions for intersect and except, but not one for difference. Obviously, a combination of union and except can be used to generate difference:


But this seems a bit awkward. In my experience, if something seems awkward, there's a better way to do it, especially in Scala.

1 Answer

Try to rewrite it as:


UNION, INTERSECT and EXCEPT / MINUS is pretty much a standard set of SQL combining operators. And I am not aware of any system which provides XOR like operation out of the box.  I can say that it is very insignificant to implement using other three operators and there isn’t much to optimize.

