0 votes
in Big Data Hadoop & Spark by (11.4k points)

In Spark version 1.2.0 one could use subtract with 2 SchemRDDs to end up with only the different content from the first one

val onlyNewData = todaySchemaRDD.subtract(yesterdaySchemaRDD)

onlyNewData contains the rows in todaySchemRDD that do not exist in yesterdaySchemaRDD.

How can this be achieved with DataFrames in Spark version 1.3.0?

1 Answer

0 votes
by (32.3k points)
edited by

In order to achieve your goal, subtract() is available for Python Spark's dataframe.


