Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I'm manually creating a dataframe for some testing. The code to create it is:

case class input(id:Long, var1:Int, var2:Int, var3:Double)
val inputDF = sqlCtx
  .createDataFrame(List(input(1110,0,1001,-10.00),
    input(1111,1,1001,10.00),
    input(1111,0,1002,10.00)))


So the schema looks like this:

root
 |-- id: long (nullable = false)
 |-- var1: integer (nullable = false)
 |-- var2: integer (nullable = false)
 |-- var3: double (nullable = false)


I want to make 'nullable = true' for each one of these variable. How do I declare that from the start or switch it in a new dataframe after it's been created?

1 Answer

0 votes
by (32.3k points)

With the imports

import org.apache.spark.sql.types.{StructField, StructType}

import org.apache.spark.sql.{DataFrame, SQLContext}

import org.apache.spark.{SparkConf, SparkContext}

you can use

/**

 * Set nullable property of column.

 * @param df source DataFrame

 * @param cn is the column name to change

 * @param nullable is the flag to set, such that the column is  either nullable or not

 */

def setNullableStateOfColumn( df: DataFrame, cn: String, nullable: Boolean) : DataFrame = {

  // get schema

  val schema = df.schema

  // modify [[StructField] with name `cn`

  val newSchema = StructType(schema.map {

    case StructField( c, t, _, m) if c.equals(cn) => StructField( c, t, nullable = nullable, m)

    case y: StructField => y

  })

  // apply new schema

  df.sqlContext.createDataFrame( df.rdd, newSchema )

}

Directly.

Related questions

...