+5 votes
2 views
in Big Data Hadoop & Spark by (11.5k points)

I'm using pyspark(Python 2.7.9/Spark 1.3.1) and have a dataframe GroupObject which I need to filter & sort in the descending order. Trying to achieve it via this piece of code.

group_by_dataframe.count().filter("`count` >= 10").sort('count', ascending=False)


But it throws the following error.

sort() got an unexpected keyword argument 'ascending'

3 Answers

+5 votes
by (31.4k points)

In PySpark 1.3 ascending parameter is not accepted by sort method. You can use desc method instead:

from pyspark.sql.functions import col

(group_by_dataframe

    .count()

    .filter("`count` >= 10")

    .sort(col("count").desc()))

or desc function:

from pyspark.sql.functions import desc

(group_by_dataframe

    .count()

    .filter("`count` >= 10")

    .sort(desc("count"))

Both the above methods are valid for Spark 2.3 and greater, including Spark 2.x.

by (19.8k points)
It worked for me!
by (47.2k points)
Thanks a lot buddy worked like a charm!!
by (107k points)
Understood properly, nicely explained.
by (44.6k points)
This worked for me too
by (31.6k points)
Yes worked for me as well!
+3 votes
by (28.1k points)

You can Use orderBy:

group_by_dataframe.count().filter("`count` >= 10").orderBy('count', ascending=False)

For more information refer: http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html

by (29.8k points)
Hi, thanks for the answer, solved my issue!!!
by (33.2k points)
Thanks, using the orderby() method is simple and worked well.
by (15.9k points)
This worked for me.
+1 vote
by (92.1k points)

As @chandra answered you can use groupBy and orderBy as follows:

dataFrameWay = df.groupBy("firstName").count().withColumnRenamed("count","distinct_name").sort(desc("count"))

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...