I need to update the column of pandas dataframe based on the processing of a list of selected values (df0['parcels'].values in code below). The code works well but is long because the list of selected values is rather long with 45000 values. This code needs 5 hours to complete the task.
As processing on each selected value is independant. I would like to try to parallelize it for improving the speed.
import numpy as np
import pandas as pd
from scipy.ndimage import distance_transform_edt as edt
for i in df0['parcels'].values:
y, x = np.where(parcels == i)
tmp = parcels[np.min(y) - 5:np.max(y) + 6, np.min(x) - 5:np.max(x) + 6]
dst = edt(tmp, sampling=r_parcels)
par = tmp[dst <= 20]
par = par[par != -9999]
mod, cnt = ss.mode(par)
df['parcels'] = df['parcels'].replace(i, mod[0])