It is possible to use lit(null):
import org.apache.spark.sql.functions.{lit, udf}
case class Record(foo: Int, bar: String)
val df = Seq(Record(1, "foo"), Record(2, "bar")).toDF
val dfWithFoobar = df.withColumn("foobar", lit(null: String))
But here you have to deal with one problem, i.e. the column type is null:
scala> dfWithFoobar.printSchema
root
|-- foo: integer (nullable = false)
|-- bar: string (nullable = true)
|-- foobar: null (nullable = true)
Also, it is not retained by the csv writer. And if it is a hard requirement you can cast column to the specific type (let’s say String), with either DataType
import org.apache.spark.sql.types.StringType
df.withColumn("foobar", lit(null).cast(StringType))
or string description
df.withColumn("foobar", lit(null).cast("string"))
or use an UDF like this:
val getNull = udf(() => None: Option[String]) // Or some other type
df.withColumn("foobar", getNull()).printSchema
root
|-- foo: integer (nullable = false)
|-- bar: string (nullable = true)
|-- foobar: string (nullable = true)