Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (19.9k points)

I have a dataframe with that values in one of column:

df.Sample

    POLSD233123

    POLRR419910

    POLAG002144

    DEUOD002139

    MEDOW203919

    ...

And I create list from another df with only numeric part of number(different base): for example

more = [419910, 983129,9128412,5353463,203919]

So in list I have a two number existing in dataframe. I need to create list of common and uncommon value.

When I will have a common, I will create uncommon too. I just write a simple loop in python:

listOfRepetitionBase_SNPS = []

for i in range(len(more)):

    temp = baza[baza['Sample'].str.contains(more[i])]

    if len(temp) > 0:

        listOfRepetitionBase_SNPS.append(temp)

    else:

        print("no that record in base,", more[i])

And it's work... but data frame has a 90xxx Samples and once run take 5-10 minutes to process. Can someone give me advice on how to make this process faster, maybe by pandas?

The result in this case should be:

listOfRepetitionBase_SNPS =  419910, 203919

uncommon =  983129,9128412,5353463

1 Answer

0 votes
by (25.1k points)

You can convert the items in set into string then add them to a set, then check for common members between them using any() and then use the difference method from set to find the uncommon members.

my_set = set(map(str, more))

common_items = [i for i in my_set if any(i in row for row in df.Sample.values)]

uncommon  = list(s.difference(common_items))

Browse Categories

...