How to pass whole Row to UDF - Spark DataFrame filter

Question

1 Answer

Amit Rawat · Answer 1 · 2019-07-28T06:28:23+0000

You have to use struct() function for constructing the row while making a call to the function, follow these steps.

Import Row,

import org.apache.spark.sql._

Define the UDF

def myFilterFunction(r:Row) = {r.get(0)==r.get(1)}

Register the UDF

sqlContext.udf.register("myFilterFunction", myFilterFunction _)

Create the dataFrame

val records = sqlContext.createDataFrame(Seq(("sachin", "sachin"), ("aggarwal", "aggarwal1"))).toDF("text", "text2")

Use the UDF

records.filter(callUdf("myFilterFunction",struct($"text",$"text2"))).show

Now, in order to pass all the columns to UDF do:

records.filter(callUdf("myFilterFunction",struct(records.columns.map(records(_)) : _*))).show

Result:

+------+------+
| text| text2|
+------+------+
|sachin|sachin|
+------+------+