How to use orderby() with descending order in Spark window functions?

Question

asked Aug 5, 2019 in Big Data Hadoop & Spark by Aarav (11.4k points)

I need a window function that partitions by some keys (=column names), orders by another column name and returns the rows with top x ranks.

This works fine for ascending order:

def getTopX(df: DataFrame, top_x: String, top_key: String, top_value:String): DataFrame ={
    val top_keys: List[String] = top_key.split(", ").map(_.trim).toList
    val w = Window.partitionBy(top_keys(1),top_keys.drop(1):_*)
       .orderBy(top_value)
    val rankCondition = "rn < "+top_x.toString
    val dfTop = df.withColumn("rn",row_number().over(w))
      .where(rankCondition).drop("rn")
return dfTop
}

But when I try to change it to orderBy(desc(top_value)) or orderBy(top_value.desc) in line 4, I get a syntax error. What's the correct syntax here?

1 Answer

Amit Rawat · Answer 1 · 2019-08-05T07:50:36+0000

There are two versions of orderBy, one that works with strings and one that works with Column objects (API). Your code is from the first version, which does not allow for changing the sort order. You need to switch to the column version and then call the desc method, e.g., myCol.desc.

Now, we get into API design territory. Passing the Column parameters gives you an advantage of flexibility, e.g., you can use expressions, etc. If you want to maintain an API that takes in a string as opposed to a Column, you need to convert the string to a column. The easiest way to do this is to use org.apache.spark.sql.functions.col(myColName).

Putting it all together, we get

.orderBy(org.apache.spark.sql.functions.col(top_value).desc)

How to use orderby() with descending order in Spark window functions?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources