I need to filter rows in a pandas dataframe so that a specific string column contains at least one of a list of provided substrings. The substrings may have unusual / regex characters. The comparison should not involve regex and is case insensitive.
For example:
lst = ['kdSj;af-!?', 'aBC+dsfa?\-', 'sdKaJg|dksaf-*']
I currently apply the mask like this:
mask = np.logical_or.reduce([df[col].str.contains(i, regex=False, case=False) for i in lst])
df = df[mask]
My dataframe is large (~1mio rows) and lst has length 100. Is there a more efficient way? For example, if the first item in lst is found, we should not have to test any subsequent strings for that row.