Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Big Data Hadoop & Spark by (11.4k points)

I want to create on DataFrame with a specified schema in Scala. I have tried to use JSON read (I mean reading empty file) but I don't think that's the best practice.

1 Answer

0 votes
by (32.3k points)
edited by

Lets assume you want a data frame with the following schema:

root

 |-- k: string (nullable = true)

 |-- v: integer (nullable = false)

You simply define schema for a data frame and use empty RDD[Row]:

import org.apache.spark.sql.types.{

    StructType, StructField, StringType, IntegerType}

import org.apache.spark.sql.Row

val schema = StructType(

    StructField("k", StringType, true) ::

    StructField("v", IntegerType, false) :: Nil)

// Spark < 2.0

// sqlContext.createDataFrame(sc.emptyRDD[Row], schema) 

spark.createDataFrame(sc.emptyRDD[Row], schema)

PySpark equivalent is almost identical:

from pyspark.sql.types import StructType, StructField, IntegerType, StringType

schema = StructType([

    StructField("k", StringType(), True), StructField("v", IntegerType(), False)

])

# or df = sc.parallelize([]).toDF(schema)

# Spark < 2.0 

# sqlContext.createDataFrame([], schema)

df = spark.createDataFrame([], schema)

If you want to know more about Scala, then do check out this awesome video tutorial:

If you want to learn Scala visit Scala Online Training by Intellipaat.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers

500 comments

94.2k users

Browse Categories

...