Let’s take a scenario where you want to add an "CASE WHEN ... ELSE ..." calculated column to an existing DataFrame.
Starting dataframe:
color
Red
Green
Blue
And your desired dataframe (SQL syntax: CASE WHEN color == Green THEN 1 ELSE 0 END AS bool):
color bool
Red 0
Green 1
Blue 0
Now, In Spark 1.3+ you can use the when/otherwise syntax:
// Create the dataframe
val df = Seq("Red", "Green", "Blue").map(Tuple1.apply).toDF("color")
// Use when/otherwise syntax
val df1 = df.withColumn("Green_Ind", when($"color" === "Green", 1).otherwise(0))
Also, in SPARK 1.3.0 you can go with a UDF:
// Define the UDF
val isGreen = udf((color: String) => {
if (color == "Green") 1
else 0
})
val df2 = df.withColumn("Green_Ind", isGreen($"color"))
In Spark 1.5.0: you can also use the SQL syntax expr function
val df13 = df.withColumn("Green_Ind", expr("case when color = 'green' then 1 else 0 end"))
or plain spark-sql
df.registerTempTable("data")
val df14 = sql(""" select *, case when color = 'green' then 1 else 0 end as Green_ind from data """)