How to compare and drop rows within groupby in pandas?

Question

asked Jul 1, 2020 in Data Science by blackindya (18.4k points)

I have a data frame that looks like this:

datetime policyid score
0 1970-01-01 00:00:01.593560812 9876policyID1234567890 0
1 1970-01-01 00:00:01.593560814 9876policyID1234567890 0
2 1970-01-01 00:00:01.593560958 9876policyID1234567890 1
3 1970-01-01 00:00:01.593560964 9876policyID1234567890 1

I want to group by policyid and score BUT only keep the row with the greatest stamp per the same policyid and score.

I am doing the groupby like so:

df.groupby(['policyid','score'])

At this point, I am not sure how to compare the timestamp between rows and keep the row with the greater time stamp.

New data frame should look like this:

datetime policyid score
1 1970-01-01 00:00:01.593560814 9876policyID1234567890 0
3 1970-01-01 00:00:01.593560964 9876policyID1234567890 1

Thank you in advance.

1 Answer

supriya · Answer 1 · 2020-07-01T06:27:09+0000

You can use sort_values, then drop_duplicates:

df=df.sort_values('datetime').drop_duplicates(['policyid','score'], keep='last')

Do check out Data Science with Python course which helps you understand from scratch

How to compare and drop rows within groupby in pandas?

Please log in to add a comment.

Please log in to answer this question.

1 Answer

Please log in to add a comment.

Related questions