How to filter rows in pandas by regex

Question

asked Sep 21, 2019 in Data Science by sourav (17.6k points)

I would like to cleanly filter a dataframe using regex on one of the columns.

For a contrived example:

In [210]: foo = pd.DataFrame({'a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat']})
In [211]: foo
Out[211]:
a b
0 1 hi
1 2 foo
2 3 fat
3 4 cat

I want to filter the rows to those that start with f using a regex. First go:

In [213]: foo.b.str.match('f.*')
Out[213]:
0 []
1 ()
2 ()
3 []

That's not too terribly useful. However this will get me my boolean index:

In [226]: foo.b.str.match('(f.*)').str.len() > 0
Out[226]:
0 False
1 True
2 True
3 False
Name: b

So I could then do my restriction by:

In [229]: foo[foo.b.str.match('(f.*)').str.len() > 0]
Out[229]:
a b
1 2 foo
2 3 fat

That makes me artificially put a group into the regex though, and seems like maybe not the clean way to go. Is there a better way to do this?

1 Answer

Shlok Pandey · Answer 1 · 2019-09-21T10:35:07+0000

Use contains:

In [10]: df.b.str.contains('^f')
Out[10]:
0 False
1 True
2 True
3 False
Name: b, dtype: bool