I have a dataframe with that values in one of column:
df.Sample
POLSD233123
POLRR419910
POLAG002144
DEUOD002139
MEDOW203919
...
And I create list from another df with only numeric part of number(different base): for example
more = [419910, 983129,9128412,5353463,203919]
So in list I have a two number existing in dataframe. I need to create list of common and uncommon value.
When I will have a common, I will create uncommon too. I just write a simple loop in python:
listOfRepetitionBase_SNPS = []
for i in range(len(more)):
temp = baza[baza['Sample'].str.contains(more[i])]
if len(temp) > 0:
listOfRepetitionBase_SNPS.append(temp)
else:
print("no that record in base,", more[i])
And it's work... but data frame has a 90xxx Samples and once run take 5-10 minutes to process. Can someone give me advice on how to make this process faster, maybe by pandas?
The result in this case should be:
listOfRepetitionBase_SNPS = 419910, 203919
uncommon = 983129,9128412,5353463