Explore Courses Blog Tutorials Interview Questions
0 votes
in Data Science by (17.6k points)

Using the Pandas package in python, I would like to sum (marginalize) over one level in a series with a 3-level multiindex to produce a series with a 2 level multiindex. For example, if I have the following:

ind = [tuple(x) for x in ['ABC', 'ABc', 'AbC', 'Abc', 'aBC', 'aBc', 'abC', 'abc']]

mi = pd.MultiIndex.from_tuples(ind)

data = pd.Series([264, 13, 29, 8, 152, 7, 15, 1], index=mi)

A  B  C    264

      c     13

   b  C     29

      c      8

a  B  C    152

      c      7

   b  C     15

      c      1

I would like to sum over the variable C to produce the following output:

A  B    277

   b     37

a  B    159

   b     16

What is the best way in Pandas to do this?

1 Answer

0 votes
by (41.4k points)

If you know you always want to aggregate over the first two levels, then this is pretty easy:

In [27]: data.groupby(level=[0, 1]).sum()


A  B    277

   b     37

a  B    159

   b     16

dtype: int64

If you wish to Learn more about Pandas visit this Pandas Tutorial.

Browse Categories