Remember

Register

All Courses Ask a Question

Questions
Unanswered
Ask a Question
Blog
Tutorials
Interview Questions

Back

Login

Explore Courses Blog Tutorials Interview Questions

community
Data Science
Parallelize apply after pandas groupby

Parallelize apply after pandas groupby

Parallelize apply after pandas groupby

0 votes

2 views

asked Sep 12, 2019 in Data Science by ashely (50.2k points)

I have used rosetta.parallel.pandas_easy to parallelize apply after group by, for example:

from rosetta.parallel.pandas_easy import groupby_to_series_to_frame
df = pd.DataFrame({'a': [6, 2, 2], 'b': [4, 5, 6]},index= ['g1', 'g1', 'g2'])
groupby_to_series_to_frame(df, np.mean, n_jobs=8, use_apply=True, by=df.index)

However, has anyone figured out how to parallelize a function that returns a dataframe? This code fails for rosetta, as expected.

def tmpFunc(df):
df['c'] = df.a + df.b
return df
df.groupby(df.index).apply(tmpFunc)
groupby_to_series_to_frame(df, tmpFunc, n_jobs=1, use_apply=True, by=df.index)

pandas
python
dataframe

Please log in to add a comment.

Please log in to answer this question.

1 Answer

0 votes

answered Sep 12, 2019 by vinita (107k points)

The following code you can try as it is not dependent on joblib and this works for me:

from multiprocessing import Pool, cpu_count
def applyParallel(dfGrouped, func):
    with Pool(cpu_count()) as p:
        ret_list = p.map(func, [group for name, group in dfGrouped])
    return pandas.concat(ret_list)

This can not replace any groupby.apply(), but it will cover the typical cases: e.g. it should cover cases 2 and 3 in the following link: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.apply.html

And you can also obtain the behavior of case 1 by giving the argument axis=1 to the final pandas.concat() call.

Please log in to add a comment.

Related questions

0 votes

1 answer

How to move pandas data from index to column after multiple groupby

asked Sep 23, 2019 in Data Science by ashely (50.2k points)

pandas
dataframe
python

0 votes

1 answer

parallelize pandas column update

asked Jul 29, 2019 in Python by Rajesh Malhotra (19.9k points)

python
pandas
dataframe

0 votes

1 answer

DataFrame groupby() on MultiIndex then apply on multiple columns leads to broadcasting problems

asked Jul 27, 2019 in Data Science by sourav (17.6k points)

pandas
data-science
dataframe

0 votes

1 answer

Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series

asked Oct 5, 2019 in Data Science by ashely (50.2k points)

python
pandas
dataframe
numpy

0 votes

1 answer

Python Pandas: How to add a totally new column to a data frame inside of a groupby/transform operation

asked Oct 5, 2019 in Data Science by ashely (50.2k points)

pandas
python
dataframe

31k questions

32.9k answers

503 comments

693 users

Browse Categories

Master Program
Big Data
Data Science
Business Intelligence
Salesforce
Cloud Computing Courses
Digital Marketing
Database
Programming
Testing
Project Management
Web Development Courses

Browse By Domains

Data Science Courses Big Data Analytics Courses Business Intelligence Courses Salesforce Courses Cloud Computing Courses Digital Marketing Courses AI & Machine Learning Courses Programming Courses Database Courses Project Management Courses Cyber Security and Ethical Hacking Courses Web Development Courses Software Testing Courses Automation Courses Job Oriented Courses Degree Courses

Popular Courses

Data Science Course Artificial Intelligence Course Data Analytics Course Machine Learning Course Python Data Science Course Business Analytics Course Python Course Azure Course DevOps Course Cyber Security Course AWS Solutions Architect Salesforce Course Selenium Course AWS DevOps Course Ethical Hacking Course Power BI Course Digital Marketing Course Business Analyst Course Investment Banking Course Azure DevOps Course Azure Data Engineer Course Electric Vehicle Course UI UX Design Course SQL Course Full Stack Developer Course Data Engineering Course Supply Chain Management Course General Management Course Product Management Course

Popular Tutorials

Data Science Tutorial Machine Learning Tutorial Cyber Security Tutorial Salesforce Tutorial AWS Tutorial Azure Tutorial SQL Tutorial Selenium Tutorial Ethical Hacking Tutorial Artificial Intelligence Tutorial

Popular Resources

Data Science Machine Learning AWS Digital Marketing Cyber Security Python Interview Questions and Answers SQL Interview Questions and Answers Data Science Interview Questions and Answers PHP Interview Questions and Answers Azure DevOps Interview Questions and Answers

About Us
Media
Privacy Policy
Terms of Use
Contact Us
Blog
Interview Questions
Tutorials
Become an Instructor

© COPYRIGHT 2011-2024 INTELLIPAAT.COM. ALL RIGHTS RESERVED.

...