How do you parallelize apply() on Pandas Dataframes making use of all cores on one machine?

Question

2 Answers

Shlok Pandey · Answer 1 · 2019-08-31T12:51:54+0000

Using this below code will apply function f in a parallel fashion to column col of dataframe df:

import multiprocessing as mp
pool = mp.Pool(mp.cpu_count())
df['newcol'] = pool.map(f, df['col'])
pool.terminate()
pool.join()

If you want to make your career in Artificial Intelligence then go through this video:

Aparna · Answer 2 · 2025-01-06T17:24:09+0000

Applying the apply function in parallel is possible and contributes greatly in enabling operations on a DataFrame to utilize every available CPU core. The default setting of the function is to be single-threaded or mono core but alternate libraries can be used to share the workload on other cores .

Parallelization using joblib

Joblib is mainly created for multiclassing programs that are CPU intensive, applications can be integrated together in the form of applying Parallel and delayed to the apply function.

To install: pip install joblib

Code Implementation

import pandas as pd

from joblib import Parallel, delayed

# Sample DataFrame

df = pd.DataFrame({'A': range(1, 1000001)})

def func(x):

return x * 2

# Using joblib to parallelize the apply

df['B'] = Parallel(n_jobs=-1)(delayed(func)(x) for x in df['A'])

In this example, n_jobs=-1 tells joblib that it should want to use all of the available cores.

joblib: For smaller data sizes , a more powerful and flexible method is available which replaces the use of apply without much hassle.

How do you parallelize apply() on Pandas Dataframes making use of all cores on one machine?

2 Answers

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources