0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

When I use DataFrame groupby like this:

df.groupBy(df("age")).agg(Map("id"->"count"))


I will only get a DataFrame with columns "age" and "count(id)",but in df,there are many other columns like "name".

In all,I want to get the result as in MySQL,

"select name,age,count(id) from df group by age"

What should I do when use groupby in Spark?

1 Answer

0 votes
by (32.5k points)

Suppose you have a df that includes columns “name” and “age”, and on these two columns you want to perform groupBY.

Now, in order to get other columns also after doing a groupBy you can use join function.

chose_group = ['name', 'age']

data_counts = df.groupBy(chose_group).count().alias("counts")

data_joined = df.join(data_counts, chose_group).dropDuplicates()

Now, data_joined will have all columns including the count values.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...