Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)
I have constructed two dataframes. How can we join multiple Spark dataframes ?

For Example :

PersonDf, ProfileDf with a common column as personId as (key). Now how can we have one Dataframe combining PersonDf and ProfileDf?

1 Answer

0 votes
by (33.1k points)

You can simply use a case class to prepare a sample data set. You can get DataFrame from hiveContext.sql as well.

For example:

import org.apache.spark.sql.functions.col

case class Person(name: String, age: Int, personid : Int)

case class Profile(name: String, personid  : Int , profileDescription: String)

    val df1 = sqlContext.createDataFrame(

   Person("Bindu",20,  2) 

:: Person("Raphel",25, 5) 

:: Person("Ram",40, 9):: Nil)

val df2 = sqlContext.createDataFrame(

Profile("Spark",2,  "SparkSQLMaster") 

:: Profile("Spark",5, "SparkGuru") 

:: Profile("Spark",9, "DevHunter"):: Nil


val df_asPerson ="dfperson")

val df_asProfile ="dfprofile")

val joined_df = df_asPerson.join(


, col("dfperson.personid") === col("dfprofile.personid")

, "inner")


, col("dfperson.age")

, col("")

, col("dfprofile.profileDescription"))


Hope this answer helps you!

Browse Categories