Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I have a data frame that looks like this:

              datetime                     policyid                   score

0   1970-01-01 00:00:01.593560812         9876policyID1234567890        0 

1   1970-01-01 00:00:01.593560814         9876policyID1234567890        0 

2   1970-01-01 00:00:01.593560958         9876policyID1234567890        1

3   1970-01-01 00:00:01.593560964         9876policyID1234567890        1

I want to group by policyid and score BUT only keep the row with the greatest stamp per the same policyid and score.

I am doing the groupby like so:

df.groupby(['policyid','score'])

At this point, I am not sure how to compare the timestamp between rows and keep the row with the greater time stamp.

New data frame should look like this:

              datetime                     policyid                   score

1   1970-01-01 00:00:01.593560814         9876policyID1234567890        0 

3   1970-01-01 00:00:01.593560964         9876policyID1234567890        1

Thank you in advance.

1 Answer

0 votes
by (36.8k points)

You can use sort_values, then drop_duplicates:

df=df.sort_values('datetime').drop_duplicates(['policyid','score'], keep='last') 

 Do check out Data Science with Python course which helps you understand from scratch 

Browse Categories

...