Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

How can I keep each element in the Customer_ID only the Col2_ID and the Qta with a maximum value of the Qta column and discard all these rest?

I am stuck here:

df1 = df.groupby(["Customer_ID", "Col2_ID"]).Qta.sum()

Customer_ID  Col2_ID       Qta

0           536544      600

            536546        1

            536550        1

            536553        3

            536555        1

                       ... 

18283       579673      134

            580872      142

18287       554065      488

            570715      990

            573167      108

After grouping, I have to each customer multiple (Col2_ID, Qta), but to each Customer, I only want specific (Col2_ID, Qta) with the maximum value.

For example, instead of one output given by my program, I want my output to be:

Customer_ID  Col2_ID       Qta

0           536544      600

                       ... 

18283       580872      142

18287       570715      990

1 Answer

0 votes
by (36.8k points)
edited by

You can chain it with the df.max which takes adv level as parameter. At the level 0 it takes max of every Customer_ID.

df.groupby(["Customer_ID", "Col2_ID"]).Qta.sum().max(level=0)

Here, you will index the Customer_ID only to get both the Customer_id and Col2_ID as index try this.

out = df.groupby(["Customer_ID", "Col2_ID"]).Qta.sum().reset_index(level=1)

idx = out['Qta'].max(level=0).index

out.loc[idx].set_index('Col2_ID', append=True)

 Want to be a master in Data Science? Enroll in this Data Science Courses

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

29.3k questions

30.6k answers

501 comments

104k users

Browse Categories

...