0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

I have two dataframes with the following columns:

df1.columns
//  Array(ts, id, X1, X2)


and

df2.columns
//  Array(ts, id, Y1, Y2)


After I do

val df_combined = df1.join(df2, Seq(ts,id))


I end up with the following columns: Array(ts, id, X1, X2, ts, id, Y1, Y2). I could expect that the common columns would be dropped. Is there something that additional that needs to be done?

1 Answer

0 votes
by (31.4k points)

Here is an example for you where the common columns will not be repeated.

val llist = Seq(("amy", "b", "2019-01-13", 4), ("prati", "a", "2019-04-23",10))

val first = llist.toDF("firstname","lastname","date","duration")

first.show()

/*

+---------+--------+----------+--------+

|firstname|lastname|      date|duration|

+---------+--------+----------+--------+

|      amy|       b|2019-01-13|       4|

|    prati|       a|2019-04-23|      10|

+---------+--------+----------+--------+

*/

Here is the second dataframe:

val second = Seq(("prati", "a", 100),("amy", "b", 23)).toDF("firstname","lastname","upload")

second.show()

/*

+---------+--------+------+

|firstname|lastname|upload|

+---------+--------+------+

|    prati|       a|   100|

|      amy|       b|    23|

+---------+--------+------+

*/

first.join(second, Seq("firstname", "lastname")).show

/*

+---------+--------+----------+--------+------+

|firstname|lastname|      date|duration|upload|

+---------+--------+----------+--------+------+

|      amy|       b|2019-01-13|       4|    23|

|    prati|       a|2019-04-23|      10|   100|

+---------+--------+----------+--------+------+

*/

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...