Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)

I'm working through the an example. The schema for the dataframe looks like:

> parquetDF.printSchema
|-- department: struct (nullable = true)
|    |-- id: string (nullable = true)
|    |-- name: string (nullable = true)
|-- employees: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- firstName: string (nullable = true)
|    |    |-- lastName: string (nullable = true)
|    |    |-- email: string (nullable = true)
|    |    |-- salary: integer (nullable = true)

In the example, they show how to explode the employees column into 4 additional columns:

val explodeDF = parquetDF.explode($"employees") {
case Row(employee: Seq[Row]) =>{ employee =>
  val firstName = employee(0).asInstanceOf[String]
  val lastName = employee(1).asInstanceOf[String]
  val email = employee(2).asInstanceOf[String]
  val salary = employee(3).asInstanceOf[Int]
  Employee(firstName, lastName, email, salary)

How would I do something similar with the department column (i.e. add two additional columns to the dataframe called "id" and "name")? The methods aren't exactly the same, and I can only figure out how to create a brand new data frame using:

val explodeDF ="","")

If I try:

val explodeDF = parquetDF.explode($"department") {
  case Row(dept: Seq[String]) =>{dept =>
  val id = dept(0)
  val name = dept(1)

I get the warning and error:

<console>:38: warning: non-variable type argument String in type pattern Seq[String] is unchecked since it is eliminated by erasure
            case Row(dept: Seq[String]) =>{dept =>
<console>:37: error: inferred type arguments [Unit] do not conform to    method explode's type parameter bounds [A <: Product]
  val explodeDF = parquetDF.explode($"department") ............


1 Answer

0 votes
by (32.3k points)

You could use something like that:

var explodeDF = explodeDF.withColumn("id",                                  explodeDF(""))

explodeDeptDF = explodeDeptDF.withColumn("name",                            explodeDeptDF(""))

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.6k answers


108k users

Browse Categories