Back

Explore Courses Blog Tutorials Interview Questions
0 votes
4 views
in Data Science by (50.2k points)

Given the following data frame:

import pandas as pd

import numpy as np

df=pd.DataFrame({'A':['A','A','A','B','B','B'],

                'B':['a','a','b','a','a','a'],

                })

df

    A   B

0   A   a 

1   A   a 

2   A   b 

3   B   a 

4   B   a 

5   B   a

I'd like to create column 'C', which numbers the rows within each group in columns A and B like this:

    A   B   C

0   A   a   1

1   A   a   2

2   A   b   1

3   B   a   1

4   B   a   2

5   B   a   3

I've tried this so far:

df['C']=df.groupby(['A','B'])['B'].transform('rank')

...but no dice! Thanks in advance!

1 Answer

0 votes
by (108k points)

For solving your query, just use groupby/cumcount:

In [25]: df['C'] = df.groupby(['A','B']).cumcount()+1; df

Out[25]: 

   A  B  C

0  A  a  1

1  A  a  2

2  A  b  1

3  B  a  1

4  B  a  2

5  B  a  3

You can refer the following link for more information regarding the same:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.cumcount.html

If you wish to learn more about Pandas visit this Pandas Tutorial.

Browse Categories

...