Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I have a DataFrame with several ids, every id has a category. My result should contain the category which occurred the most for a certain id.

Example:

id  categorie

1   aaa

1   aaa

2   bbb

2   bbb

2   aaa

3   aaa

3   ccc

3   ccc

Result:

id  categorie

1   aaa

2   bbb

3   ccc

I tried several .groupby() approaches but none have worked so far.

1 Answer

0 votes
by (36.8k points)

You can just do this:

df = df.groupby(by=['id'], as_index=False)['categorie'].max()

Or:

df = df.groupby(by=['id'], as_index=False).agg(lambda x:x.value_counts().index[0])

print(df)

   id categorie

0   1       aaa

1   2       bbb

2   3       ccc

If you want to know more about the Data Science then do check out the following Data Science which will help you in understanding Data Science from scratch

Browse Categories

...