Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I'm new to SPARK-SQL. Is there an equivalent to "CASE WHEN 'CONDITION' THEN 0 ELSE 1 END" in SPARK SQL ?


 

1 Answer

0 votes
by (32.3k points)

Let’s take a scenario where you want to add an "CASE WHEN ... ELSE ..." calculated column to an existing DataFrame.

Starting dataframe:

color

Red

Green

Blue

And your desired dataframe (SQL syntax: CASE WHEN color == Green THEN 1 ELSE 0 END AS bool):

color bool

Red   0

Green 1

Blue  0


 

Now, In Spark 1.3+ you can use the when/otherwise syntax:

 

// Create the dataframe

val df = Seq("Red", "Green", "Blue").map(Tuple1.apply).toDF("color")

 

// Use when/otherwise syntax

val df1 = df.withColumn("Green_Ind", when($"color" === "Green", 1).otherwise(0))

 

Also, in SPARK 1.3.0 you can go with a UDF:

 

// Define the UDF

val isGreen = udf((color: String) => {

  if (color == "Green") 1

  else 0

})

val df2 = df.withColumn("Green_Ind", isGreen($"color"))

 

In Spark 1.5.0: you can also use the SQL syntax expr function

 

val df13 = df.withColumn("Green_Ind", expr("case when color = 'green' then 1 else 0 end"))

 

or plain spark-sql

 

df.registerTempTable("data")

val df14 = sql(""" select *, case when color = 'green' then 1 else 0 end as Green_ind from data """)

...