I'm manually creating a dataframe for some testing. The code to create it is:

case class input(id:Long, var1:Int, var2:Int, var3:Double)
val inputDF = sqlCtx

So the schema looks like this:

 |-- id: long (nullable = false)
 |-- var1: integer (nullable = false)
 |-- var2: integer (nullable = false)
 |-- var3: double (nullable = false)

I want to make 'nullable = true' for each one of these variable. How do I declare that from the start or switch it in a new dataframe after it's been created?

1 Answer

0 votes
With the imports

import org.apache.spark.sql.types.{StructField, StructType}

import org.apache.spark.sql.{DataFrame, SQLContext}

import org.apache.spark.{SparkConf, SparkContext}

you can use


 * Set nullable property of column.

 * @param df source DataFrame

 * @param cn is the column name to change

 * @param nullable is the flag to set, such that the column is  either nullable or not


def setNullableStateOfColumn( df: DataFrame, cn: String, nullable: Boolean) : DataFrame = {

  // get schema

  val schema = df.schema

  // modify [[StructField] with name `cn`

  val newSchema = StructType( {

    case StructField( c, t, _, m) if c.equals(cn) => StructField( c, t, nullable = nullable, m)

    case y: StructField => y


  // apply new schema

  df.sqlContext.createDataFrame( df.rdd, newSchema )



