Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I have a dataframe that looks as the following:

ip_address    malware_type

ip_1          malware_1

ip_2          malware_2

ip_1          malware_1

ip_1          malware_1

ip_1          malware_2

ip_2          malware_2

ip_2          malware_3

.

.

.

I want to drop duplicate rows based on the 'ip_address' column, however, when I dropping occurs, I want to keep only the 'malware_type' value that is the most frequent for each IP. So the resulting data frame should look like:

ip_address    malware_type

ip_1          malware_1

ip_2          malware_2

.

1 Answer

0 votes
by (36.8k points)
edited by

Let us try mode

s=df.groupby('ip_address').malware_type.agg(lambda x : x.mode()[0]) # .reset_index()

Out[56]: 

ip_address

ip_1    malware_1

ip_2    malware_2

Name: malware_type, dtype: object

Improve your knowledge in data science from scratch using Data science online courses 

Browse Categories

...