Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I have a code using the ProcessPoolExecutor which can't pickle lambdas and functions. Some of my code that I want to execute it in parallel use the defaultdict with the default value of None.

How would you proceed? If at all possible, I would not like to touch my parallelizing code.

What I have:

class SomeClass:

    def __init__(self):

        self.some_dict = defaultdict(lambda: None)

    def generate(self):

        <some code>

def some_method_to_parallelize(x: SomeClass):

    <some code>

def some_method():

    max_workers = round(os.cpu_count() // 1.5)

    invocations_per_process = 100

    with ProcessPoolExecutor(max_workers=max_workers) as executor:    

        data = [executor.submit(some_method_to_parallelize, SomeClass())] for _ in range(invocations_per_process)]

        data = list(itertools.chain.from_iterable([r.result() for r in data]))

1 Answer

0 votes
by (36.8k points)

Try:

collections.defaultdict(type(None))

That gets you the reference to a NoneType for use as the defaultdict's default factory. When constructed, it produces a None, and unlike the lambda, appears to be picklable.

Want to gain skills in Data Science with Python? Sign up today for this Python for Data Science Course and be a master in it

Browse Categories

...