Normalize within groups in Pandas

Question

asked Jun 4, 2020 in R Programming by ashely (50.2k points)

I have a collection of data that has a grouping variable, a position, and a value at that position:

Sample Position Depth
A 1 2
A 2 3
A 3 4
B 1 1
B 2 3
B 3 2

I want to create a new column that is a privately normalized depth as follows:

Sample Position Depth NormalizedDepth
A 1 2 0
A 2 3 0.5
A 3 4 1
B 1 1 0
B 2 3 1
B 3 2 0.5

This is typically expressed by the formula NormalizedDepth = (x - min(x))/(max(x)-min(x)) such that the minimum and maximum are of the group.

I know that we can achieve that with dplyr in R with the following:

depths %>%
group_by(Sample) %>%
mutate(NormalizedDepth = 100 * (Depth - min(Depth))/(max(Depth) - min(Depth)))

I just want to know that how we can do this with pandas

1 Answer

vinita · Answer 1 · 2020-06-04T06:08:15+0000

You can just use the transform() with ptp (getting the difference between the max and min):

import numpy as np
g=df.groupby('Sample').Depth
df['new']=(df.Depth-g.transform('min'))/g.transform(np.ptp)
0 0.0
1 0.5
2 1.0
3 0.0
4 1.0
5 0.5
Name: Depth, dtype: float64

If you are a beginner and want to know more about R then do refer to the R programming course.

Normalize within groups in Pandas

1 Answer

Related questions

Browse Categories