Pandas for each primary key keep only the row having the max value into another column

Question

asked Oct 31, 2020 in Data Science by blackindya (18.4k points)

How can I keep each element in the Customer_ID only the Col2_ID and the Qta with a maximum value of the Qta column and discard all these rest?

I am stuck here:

df1 = df.groupby(["Customer_ID", "Col2_ID"]).Qta.sum()
Customer_ID Col2_ID Qta
0 536544 600
536546 1
536550 1
536553 3
536555 1
...
18283 579673 134
580872 142
18287 554065 488
570715 990
573167 108

After grouping, I have to each customer multiple (Col2_ID, Qta), but to each Customer, I only want specific (Col2_ID, Qta) with the maximum value.

For example, instead of one output given by my program, I want my output to be:

Customer_ID Col2_ID Qta
0 536544 600
...
18283 580872 142
18287 570715 990

1 Answer

supriya · Answer 1 · 2020-10-31T04:44:52+0000

You can chain it with the df.max which takes adv level as parameter. At the level 0 it takes max of every Customer_ID.

df.groupby(["Customer_ID", "Col2_ID"]).Qta.sum().max(level=0)

Here, you will index the Customer_ID only to get both the Customer_id and Col2_ID as index try this.

out = df.groupby(["Customer_ID", "Col2_ID"]).Qta.sum().reset_index(level=1)
idx = out['Qta'].max(level=0).index
out.loc[idx].set_index('Col2_ID', append=True)

Want to be a master in Data Science? Enroll in this Data Scientist

Pandas for each primary key keep only the row having the max value into another column

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources