Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I would like to cleanly filter a dataframe using regex on one of the columns.

For a contrived example:

In [210]: foo = pd.DataFrame({'a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat']})

In [211]: foo

Out[211]: 

   a    b

0  1   hi

1  2  foo

2  3  fat

3  4  cat

I want to filter the rows to those that start with f using a regex. First go:

In [213]: foo.b.str.match('f.*')

Out[213]: 

0    []

1    ()

2    ()

3    []

That's not too terribly useful. However this will get me my boolean index:

In [226]: foo.b.str.match('(f.*)').str.len() > 0

Out[226]: 

0    False

1     True

2     True

3    False

Name: b

So I could then do my restriction by:

In [229]: foo[foo.b.str.match('(f.*)').str.len() > 0]

Out[229]: 

   a    b

1  2  foo

2  3  fat

That makes me artificially put a group into the regex though, and seems like maybe not the clean way to go. Is there a better way to do this?

1 Answer

0 votes
by (41.4k points)

Use contains:

In [10]: df.b.str.contains('^f')

Out[10]: 

0    False

1     True

2     True

3    False

Name: b, dtype: bool

Related questions

0 votes
1 answer
+1 vote
1 answer
0 votes
1 answer

Browse Categories

...