Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I have a dataframe, grouped, with multiindex columns as below:

import pandas as pd

codes = ["one","two","three"];

colours = ["black", "white"];

textures = ["soft", "hard"];

N= 100 # length of the dataframe

df = pd.DataFrame({ 'id' : range(1,N+1),

                    'weeks_elapsed' : [random.choice(range(1,25)) for i in range(1,N+1)],

                    'code' : [random.choice(codes) for i in range(1,N+1)],

                    'colour': [random.choice(colours) for i in range(1,N+1)],

                    'texture': [random.choice(textures) for i in range(1,N+1)],

                    'size': [random.randint(1,100) for i in range(1,N+1)],

                    'scaled_size': [random.randint(100,1000) for i in range(1,N+1)]

                   },  columns= ['id', 'weeks_elapsed', 'code','colour', 'texture', 'size', 'scaled_size'])

grouped = df.groupby(['code', 'colour']).agg( {'size': [np.sum, np.average, np.size, pd.Series.idxmax],'scaled_size': [np.sum, np.average, np.size, pd.Series.idxmax]}).reset_index()

>> grouped

    code colour     size                           scaled_size                         

                    sum    average  size  idxmax            sum    average  size  idxmax

0    one  black    1031  60.647059    17      81     185.153944  10.891408    17      47

1    one  white     481  37.000000    13      53     204.139249  15.703019    13      53

2  three  black     822  48.352941    17       6     123.269405   7.251141    17      31

3  three  white    1614  57.642857    28      50     285.638337  10.201369    28      37

4    two  black     523  58.111111     9      85      80.908912   8.989879     9      88

5    two  white     669  41.812500    16      78      82.098870   5.131179    16      78

[6 rows x 10 columns]

How can I flatten/merge the column index levels as: "Level1|Level2", e.g. size|sum, scaled_size|sum. etc? If this is not possible, is there a way to groupby() as I did above without creating multi-index columns?

1 Answer

0 votes
by (41.4k points)

You could always change the columns:

grouped.columns = ['%s%s' % (a, '|%s' % b if b else '') for a, b in grouped.columns]

If you wish to learn more about Pandas visit this Pandas Tutorial.

Related questions

Browse Categories

...