Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)
I have constructed two dataframes. How can we join multiple Spark dataframes ?

For Example :

PersonDf, ProfileDf with a common column as personId as (key). Now how can we have one Dataframe combining PersonDf and ProfileDf?

1 Answer

0 votes
by (33.1k points)

You can simply use a case class to prepare a sample data set. You can get DataFrame from hiveContext.sql as well.

For example:

import org.apache.spark.sql.functions.col

case class Person(name: String, age: Int, personid : Int)

case class Profile(name: String, personid  : Int , profileDescription: String)

    val df1 = sqlContext.createDataFrame(

   Person("Bindu",20,  2) 

:: Person("Raphel",25, 5) 

:: Person("Ram",40, 9):: Nil)

val df2 = sqlContext.createDataFrame(

Profile("Spark",2,  "SparkSQLMaster") 

:: Profile("Spark",5, "SparkGuru") 

:: Profile("Spark",9, "DevHunter"):: Nil

)

val df_asPerson = df1.as("dfperson")

val df_asProfile = df2.as("dfprofile")

val joined_df = df_asPerson.join(

    df_asProfile

, col("dfperson.personid") === col("dfprofile.personid")

, "inner")

joined_df.select(

  col("dfperson.name")

, col("dfperson.age")

, col("dfprofile.name")

, col("dfprofile.profileDescription"))

.show

Hope this answer helps you!

Browse Categories

...