0 votes
1 view
in Data Science by (15.5k points)

I am trying to groupby a column and compute value counts on another column.

import pandas as pd

dftest = pd.DataFrame({'A':[1,1,1,1,1,1,1,1,1,2,2,2,2,2], 

               'Amt':[20,20,20,30,30,30,30,40, 40,10, 10, 40,40,40]})

print(dftest)

dftest looks like

    A  Amt

0   1 20

1   1 20

2   1 20

3   1 30

4   1 30

5   1 30

6   1 30

7   1 40

8   1 40

9   2 10

10  2 10

11  2 40

12  2 40

13  2 40

perform grouping

grouper = dftest.groupby('A')

df_grouped = grouper['Amt'].value_counts()

which gives

   A  Amt

1  30     4

    20     3

    40     2

2  40     3

    10     2

Name: Amt, dtype: int64

what I want is to keep the top two rows of each group

Also, I was perplexed by an error when I tried to reset_index

df_grouped.reset_index()

which gives following error

df_grouped.reset_index()

ValueError: cannot insert Amt, already exists

1 Answer

0 votes
by (33.2k points)

Pandas.reset_index() function generates a new DataFrame or Series with the index reset. This is used where the index is needed to be used as a column. A parameter name in reset_index is needed because Series name is the same as the name of one of the levels of MultiIndex:

df_grouped.reset_index(name='count')

Another solution is to rename Series name:

print (df_grouped.rename('count').reset_index())

   A  Amt  count

0  1 30      4

1  1 20      3

2  1 40      2

3  2 40      3

4  2 10      2

The more common solution instead of value_counts is aggregate size:

df_grouped1 =  dftest.groupby(['A','Amt']).size().rename('count').reset_index()

print (df_grouped1)

   A  Amt  count

0  1 20      3

1  1 30      4

2  1 40      2

3  2 10      2

4  2 40      3

If you wish to learn Python, then check out this Python tutorial and  Python Course by Intellipaat.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...