Best way to search dataframe by list

Question

asked Jul 29, 2019 in Python by Rajesh Malhotra (19.9k points)

I have a dataframe with that values in one of column:

df.Sample
POLSD233123
POLRR419910
POLAG002144
DEUOD002139
MEDOW203919
...

And I create list from another df with only numeric part of number(different base): for example

more = [419910, 983129,9128412,5353463,203919]

So in list I have a two number existing in dataframe. I need to create list of common and uncommon value.

When I will have a common, I will create uncommon too. I just write a simple loop in python:

listOfRepetitionBase_SNPS = []
for i in range(len(more)):
temp = baza[baza['Sample'].str.contains(more[i])]
if len(temp) > 0:
listOfRepetitionBase_SNPS.append(temp)
else:
print("no that record in base,", more[i])

And it's work... but data frame has a 90xxx Samples and once run take 5-10 minutes to process. Can someone give me advice on how to make this process faster, maybe by pandas?

The result in this case should be:

listOfRepetitionBase_SNPS = 419910, 203919
uncommon = 983129,9128412,5353463

1 Answer

Anirudh Singh · Answer 1 · 2019-07-29T10:31:14+0000

You can convert the items in set into string then add them to a set, then check for common members between them using any() and then use the difference method from set to find the uncommon members.

my_set = set(map(str, more))
common_items = [i for i in my_set if any(i in row for row in df.Sample.values)]
uncommon = list(s.difference(common_items))

Best way to search dataframe by list

1 Answer

Related questions

Browse Categories