Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I'm looking for a way to do the equivalent to the sql

"SELECT DISTINCT col1, col2 FROM dataframe_table"

The pandas sql comparison doesn't have anything about "distinct"

.unique() only works for a single column, so I suppose I could concat the columns, or put them in a list/tuple and compare that way, but this seems like something pandas should do in a more native way.

Am I missing something obvious, or is there no way to do this?

1 Answer

0 votes
by (41.4k points)

Use the drop_duplicates. This method  is used to get the unique rows in a DataFrame:

In [29]: df = pd.DataFrame({'a':[1,2,1,2], 'b':[3,4,3,5]})

In [30]: df

Out[30]:

   a  b

0  1  3

1  2  4

2  1  3

3  2  5

In [32]: df.drop_duplicates()

Out[32]:

   a  b

0  1  3

1  2  4

3  2  5

Browse Categories

...