Using pandas groupby on two column how to derived a new column

Question

asked Jul 11, 2019 in Data Science by sourav (17.6k points)

I have Dataset which has many columns and I want create a new column based on two columns on that dataset.

train_data[['CtpJobId', 'SegmentId']]
CtpJobId SegmentId
0 qa1-9epx-dk1 347772
1 qa1-9epx-dv1 347774
2 qa1-9epx-dv1 347777
3 qa1-9epx-dv1 347780
4 qa1-9epx-dv1 347783
5 qa1-9epx-dv1 347786
6 qa1-9epx-dv1 347789
7 qa1-9epx-dv1 347792
8 qa1-9epx-e01 347794
9 qa1-9epx-eb2 347795
10 qa1-9epx-ez1 347796
11 qa1-9epx-f32 347797
12 qa1-9epx-fi1 347798

Now I want create a new column called numberOfSegment such as if same jobId has multiple segmentId then aggregate that segmentId and insert that sum up to that new column.

CtpJobId SegmentId numberOfSegment
0 qa1-9epx-dk1 347772 1
1 qa1-9epx-dv1 347774 7
2 qa1-9epx-dv1 347777 7
3 qa1-9epx-dv1 347780 7
4 qa1-9epx-dv1 347783 7
5 qa1-9epx-dv1 347786 7
6 qa1-9epx-dv1 347789 7
7 qa1-9epx-dv1 347792 7
8 qa1-9epx-e01 347794 1
9 qa1-9epx-eb2 347795 1
10 qa1-9epx-ez1 347796 1
11 qa1-9epx-f32 347797 1

I did in one way but it gives wrong

train_data['NumberOfSegment'] = train_data.groupby('CtpJobId')['SegmentId'].sum()
train_data[['CtpJobId','NumberOfSegment']]
CtpJobId NumberOfSegment
0 qa1-9epx-dk1 NaN
1 qa1-9epx-dv1 NaN
2 qa1-9epx-dv1 NaN
3 qa1-9epx-dv1 NaN
4 qa1-9epx-dv1 NaN
5 qa1-9epx-dv1 NaN
6 qa1-9epx-dv1 NaN
7 qa1-9epx-dv1 NaN
8 qa1-9epx-e01 NaN
9 qa1-9epx-eb2 NaN
10 qa1-9epx-ez1 NaN

Can Anyone please help me? Thanks in advance

1 Answer

Shlok Pandey · Answer 1 · 2019-07-20T07:09:45+0000

Here, you need to use transform with 'count' as depicted in the code below:

train_data['NumberOfSegment'] = train_data.groupby('CtpJobId')['SegmentId'].transform('count')

And then the output will be:

CtpJobId SegmentId NumberOfSegment
0 qa1-9epx-dk1 347772 1
1 qa1-9epx-dv1 347774 7
2 qa1-9epx-dv1 347777 7
3 qa1-9epx-dv1 347780 7
4 qa1-9epx-dv1 347783 7
5 qa1-9epx-dv1 347786 7
6 qa1-9epx-dv1 347789 7
7 qa1-9epx-dv1 347792 7
8 qa1-9epx-e01 347794 1
9 qa1-9epx-eb2 347795 1
10 qa1-9epx-ez1 347796 1
11 qa1-9epx-f32 347797 1
12 qa1-9epx-fi1 347798 1

If you want to learn Pandas then visit this Python Pandas Tutorial.

Using pandas groupby on two column how to derived a new column

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources