Back

Explore Courses Blog Tutorials Interview Questions
0 votes
4 views
in Big Data Hadoop & Spark by (11.4k points)

In Spark version 1.2.0 one could use subtract with 2 SchemRDDs to end up with only the different content from the first one

val onlyNewData = todaySchemaRDD.subtract(yesterdaySchemaRDD)


onlyNewData contains the rows in todaySchemRDD that do not exist in yesterdaySchemaRDD.

How can this be achieved with DataFrames in Spark version 1.3.0?

1 Answer

0 votes
by (32.3k points)
edited by

In order to achieve your goal, subtract() is available for Python Spark's dataframe.

df1.subtract(df2)

If you want to know more about Spark, then do check out this awesome video tutorial:

 

Browse Categories

...