Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

How can I keep each element in the Customer_ID only the Col2_ID and the Qta with a maximum value of the Qta column and discard all these rest?

I am stuck here:

df1 = df.groupby(["Customer_ID", "Col2_ID"]).Qta.sum()

Customer_ID  Col2_ID       Qta

0           536544      600

            536546        1

            536550        1

            536553        3

            536555        1

                       ... 

18283       579673      134

            580872      142

18287       554065      488

            570715      990

            573167      108

After grouping, I have to each customer multiple (Col2_ID, Qta), but to each Customer, I only want specific (Col2_ID, Qta) with the maximum value.

For example, instead of one output given by my program, I want my output to be:

Customer_ID  Col2_ID       Qta

0           536544      600

                       ... 

18283       580872      142

18287       570715      990

1 Answer

0 votes
by (36.8k points)
edited by

You can chain it with the df.max which takes adv level as parameter. At the level 0 it takes max of every Customer_ID.

df.groupby(["Customer_ID", "Col2_ID"]).Qta.sum().max(level=0)

Here, you will index the Customer_ID only to get both the Customer_id and Col2_ID as index try this.

out = df.groupby(["Customer_ID", "Col2_ID"]).Qta.sum().reset_index(level=1)

idx = out['Qta'].max(level=0).index

out.loc[idx].set_index('Col2_ID', append=True)

 Want to be a master in Data Science? Enroll in this Data Science Courses

Browse Categories

...