0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

I am starting to use Spark DataFrames and I need to be able to pivot the data to create multiple columns out of 1 column with multiple rows. There is built in functionality for that in Scalding and I believe in Pandas in Python, but I can't find anything for the new Spark Dataframe.

I assume I can write custom function of some sort that will do this but I'm not even sure how to start, especially since I am a novice with Spark. I anyone knows how to do this with built in functionality or suggestions for how to write something in Scala, it is greatly appreciated.

1 Answer

0 votes
by (32.5k points)

Spark provides pivot function since version 1.6.

Let me give you a example using nycflights13 and csv format.

Nycflights13 is a package that contains information about all flights that departed from NYC (e.g. EWR, JFK and LGA) in 2013: 336,776 flights in total. To help understand what causes delays, it also includes a number of other useful datasets. This package provides the following data tables.


val flights = sqlContext



  .options(Map("inferSchema" -> "true", "header" -> "true"))



  .groupBy($"origin", $"dest", $"carrier")



Welcome to Intellipaat Community. Get your technical queries answered by top developers !