0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

In Spark version 1.2.0 one could use subtract with 2 SchemRDDs to end up with only the different content from the first one

val onlyNewData = todaySchemaRDD.subtract(yesterdaySchemaRDD)


onlyNewData contains the rows in todaySchemRDD that do not exist in yesterdaySchemaRDD.

How can this be achieved with DataFrames in Spark version 1.3.0?

1 Answer

0 votes
by (32.5k points)
edited by

In order to achieve your goal, subtract() is available for Python Spark's dataframe.

df1.subtract(df2)

If you want to know more about Spark, then do check out this awesome video tutorial:

 

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...