Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in Data Science by (17.6k points)

The pandas drop_duplicates function is great for "uniquifying" a dataframe. However, one of the keyword arguments to pass is take_last=True or take_last=False, while I would like to drop all rows which are duplicates across a subset of columns. Is this possible?

    A   B   C

0   foo 0   A

1   foo 1   A

2   foo 1   B

3   bar 1   A

As an example, I would like to drop rows which match on columns A and C so this should drop rows 0 and 1.

1 Answer

0 votes
by (41.4k points)
edited by

 Use drop_duplicates:

import pandas as pd

df = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]})

df.drop_duplicates(subset=['A', 'C'], keep=False)

To know more about this you can have a look at the following video tutorial:-

If you want to learn more about Pandas then visit this Python Course designed by the industrial experts.

 

Related questions

0 votes
1 answer
asked Aug 1, 2019 in R Programming by ashely (50.2k points)
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
asked Jul 4, 2019 in SQL by Tech4ever (20.3k points)

Browse Categories

...