Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

How can I achieve the equivalents of SQL's IN and NOT IN?

I have a list with the required values. Here's the scenario:

df = pd.DataFrame({'countries':['US','UK','Germany','China']})

countries = ['UK','China']

# pseudo-code:

df[df['countries'] not in countries]

My current way of doing this is as follows:

df = pd.DataFrame({'countries':['US','UK','Germany','China']})

countries = pd.DataFrame({'countries':['UK','China'], 'matched':True})

# IN

df.merge(countries,how='inner',on='countries')

# NOT IN

not_in = df.merge(countries,how='left',on='countries')

not_in = not_in[pd.isnull(not_in['matched'])]

But this seems like a horrible kludge. Can anyone improve on it?

1 Answer

0 votes
by (41.4k points)

You can use pd.Series.isin where it checks whether the values are contained in Series and returns a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.

For "IN" : data.isin(given_data)

Or for "NOT IN": ~data.isin(given_data)

Example:

>>> df

  countries

0        India

1        France

2        Austria

3     China

>>> countries

['France', 'China']

>>> df.countries.isin(countries)

0    False

1     True

2    False

3     True

Name: countries, dtype: bool

>>> df[df.countries.isin(countries)]

  countries

1        France

3     China

>>> df[~df.countries.isin(countries)]

  countries

0        India

2       Austria

Browse Categories

...