Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

Given the following (totally overkill) data frame example

import pandas as pd

import datetime as dt

df = pd.DataFrame({

         "date"    :  [dt.date(2012, x, 1) for x in range(1, 11)], 

         "returns" :  0.05 * np.random.randn(10), 

         "dummy"   :  np.repeat(1, 10)

})

is there an existing built-in way to apply two different aggregating functions to the same column, without having to call agg multiple times?

The syntactically wrong, but intuitively right, way to do it would be:

# Assume `function1` and `function2` are defined for aggregating.

df.groupby("dummy").agg({"returns":function1, "returns":function2})

Obviously, Python doesn't allow duplicate keys. Is there any other manner for expressing the input to agg? Perhaps a list of tuples [(column, function)] would work better, to allow multiple functions applied to the same column? But it seems like it only accepts a dictionary.

Is there a workaround for this besides defining an auxiliary function that just applies both of the functions inside of it? (How would this work with aggregation anyway?)

1 Answer

0 votes
by (41.4k points)

Pass the functions as a list:

In [20]: df.groupby("dummy").agg({"returns": [np.mean, np.sum]})

Out[20]: 

        returns          

            sum      mean

dummy                    

1      0.285833  0.028583

 

or as a dictionary:

In [21]: df.groupby('dummy').agg({'returns':

                                  {'Mean': np.mean, 'Sum': np.sum}})

Out[21]: 

        returns          

            Sum      Mean

dummy                    

1      0.285833  0.028583

If you want to learn more about Pandas then visit this Python Course designed by the industrial experts.

...