Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

How could I generate a new column listing repeated values? For example, this is my data frame:

id    color

123   white

123   white

123   white

345   blue

345   blue

678   red

This is what I wanted:

#    id   color

1   123   white

1   123   white

1   123   white 

2   345   blue

2   345   blue

3   678   red

1 Answer

0 votes
by (36.8k points)

Use factorize

df.id.factorize()[0]+1

0    1

1    1

2    1

3    2

4    2

5    3

dtype: int64

Another method is:

df.groupby('id').ngroup()+1

0    1

1    1

2    1

3    2

4    2

5    3

dtype: int64

Update the first column

df.insert(loc=0,column='#',value=df.groupby('id').ngroup()+1)

df

   #   id  color

0  1  123  white

1  1  123  white

2  1  123  white

3  2  345   blue

4  2  345   blue

5  3  678    red

 Improve your knowledge in data science from scratch using Data science online courses

Browse Categories

...