Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)

Is it an only option to list all the arguments up to 22 as shown in documentation?

Anyone figured out how to do something similar to this?


sc.udf.register("func", (s: String*) => s......

(writing custom concat function that skips nulls, had to 2 arguments at the time)

1 Answer

0 votes
by (32.3k points)

Just note that UDFs don't support varargs* but you can pass an arbitrary number of columns wrapped using an array function:

import org.apache.spark.sql.functions.{udf, array, lit}

val myConcatFunc = (xs: Seq[Any], sep: String) => 

  xs.filter(_ != null).mkString(sep)

val myConcat = udf(myConcatFunc)

An example usage:

val  df = sc.parallelize(Seq(

  (null, "a", "b", "c"), ("d", null, null, "e")

)).toDF("x1", "x2", "x3", "x4")

val cols = array($"x1", $"x2", $"x3", $"x4")

val sep = lit("-"), sep).alias("concatenated")).show

// +------------+

// |concatenated|

// +------------+

// |       a-b-c|

// |         d-e|

// +------------+

With raw SQL:


sqlContext.udf.register("myConcat", myConcatFunc)


    "SELECT myConcat(array(x1, x2, x4), '.') AS concatenated FROM df"


// +------------+

// |concatenated|

// +------------+

// |         a.c|

// |         d.e|

// +------------+

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.5k answers


108k users

Browse Categories