Explore Courses Blog Tutorials Interview Questions
0 votes
in Data Science by (17.6k points)

Given the following (totally overkill) data frame example

import pandas as pd

import datetime as dt

df = pd.DataFrame({

         "date"    :  [, x, 1) for x in range(1, 11)], 

         "returns" :  0.05 * np.random.randn(10), 

         "dummy"   :  np.repeat(1, 10)


is there an existing built-in way to apply two different aggregating functions to the same column, without having to call agg multiple times?

The syntactically wrong, but intuitively right, way to do it would be:

# Assume `function1` and `function2` are defined for aggregating.

df.groupby("dummy").agg({"returns":function1, "returns":function2})

Obviously, Python doesn't allow duplicate keys. Is there any other manner for expressing the input to agg? Perhaps a list of tuples [(column, function)] would work better, to allow multiple functions applied to the same column? But it seems like it only accepts a dictionary.

Is there a workaround for this besides defining an auxiliary function that just applies both of the functions inside of it? (How would this work with aggregation anyway?)

1 Answer

0 votes
by (41.4k points)

Pass the functions as a list:

In [20]: df.groupby("dummy").agg({"returns": [np.mean, np.sum]})



            sum      mean


1      0.285833  0.028583


or as a dictionary:

In [21]: df.groupby('dummy').agg({'returns':

                                  {'Mean': np.mean, 'Sum': np.sum}})



            Sum      Mean


1      0.285833  0.028583

If you want to learn more about Pandas then visit this Python Course designed by the industrial experts.

Browse Categories