Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I have Dataset which has many columns and I want create a new column based on two columns on that dataset.

   train_data[['CtpJobId', 'SegmentId']]

     CtpJobId     SegmentId

0   qa1-9epx-dk1    347772

1   qa1-9epx-dv1    347774

2   qa1-9epx-dv1    347777

3   qa1-9epx-dv1    347780

4   qa1-9epx-dv1    347783

5   qa1-9epx-dv1    347786

6   qa1-9epx-dv1    347789

7   qa1-9epx-dv1    347792

8   qa1-9epx-e01    347794

9   qa1-9epx-eb2    347795

10  qa1-9epx-ez1    347796

11  qa1-9epx-f32    347797

12  qa1-9epx-fi1    347798

Now I want create a new column called numberOfSegment such as if same jobId has multiple segmentId then aggregate that segmentId and insert that sum up to that new column.

      CtpJobId        SegmentId    numberOfSegment

0   qa1-9epx-dk1    347772             1

1   qa1-9epx-dv1    347774             7

2   qa1-9epx-dv1    347777             7

3   qa1-9epx-dv1    347780             7

4   qa1-9epx-dv1    347783             7

5   qa1-9epx-dv1    347786             7

6   qa1-9epx-dv1    347789             7

7   qa1-9epx-dv1    347792             7

8   qa1-9epx-e01    347794             1

9   qa1-9epx-eb2    347795             1

10  qa1-9epx-ez1    347796             1

11  qa1-9epx-f32    347797             1

I did in one way but it gives wrong

train_data['NumberOfSegment'] = train_data.groupby('CtpJobId')['SegmentId'].sum()

 train_data[['CtpJobId','NumberOfSegment']]

   CtpJobId    NumberOfSegment

0   qa1-9epx-dk1    NaN

1   qa1-9epx-dv1    NaN

2   qa1-9epx-dv1    NaN

3   qa1-9epx-dv1    NaN

4   qa1-9epx-dv1    NaN

5   qa1-9epx-dv1    NaN

6   qa1-9epx-dv1    NaN

7   qa1-9epx-dv1    NaN

8   qa1-9epx-e01    NaN

9   qa1-9epx-eb2    NaN

10  qa1-9epx-ez1    NaN

Can Anyone please help me? Thanks in advance

1 Answer

0 votes
by (41.4k points)

Here, you need to use transform with 'count' as depicted in the code below:

train_data['NumberOfSegment'] = train_data.groupby('CtpJobId')['SegmentId'].transform('count')

And then the output will be:

 CtpJobId         SegmentId NumberOfSegment

0   qa1-9epx-dk1     347772     1

1   qa1-9epx-dv1     347774     7

2   qa1-9epx-dv1     347777     7

3   qa1-9epx-dv1     347780     7

4   qa1-9epx-dv1     347783     7

5   qa1-9epx-dv1     347786     7

6   qa1-9epx-dv1      347789    7

7   qa1-9epx-dv1      347792    7

8   qa1-9epx-e01      347794    1

9   qa1-9epx-eb2      347795    1

10  qa1-9epx-ez1     347796     1

11  qa1-9epx-f32      347797    1

12  qa1-9epx-fi1       347798 1

If you want to learn Pandas then visit this Python Pandas Tutorial.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

29.3k questions

30.6k answers

501 comments

104k users

Browse Categories

...