3 views

I have a Pandas data frame 'df' in which I'd like to perform some scalings column by column.

In column 'a', I need the maximum number to be 1, the minimum number to be 0, and all other to be spread accordingly.

In column 'b', however, I need the minimum number to be 1, the maximum number to be 0, and all other to be spread accordingly.

Is there a Pandas function to perform these two operations? If not, numpy would certainly do.

a    b

A   14   103

B   90   107

C   90   110

D   96   114

E   91   114

by (41.4k points)

You could subtract by the min, then divide by the max (beware 0/0). Note that after subtracting the min, the new max is the original max - min.

In [11]: df

Out[11]:

a    b

A  14  103

B  90  107

C  90  110

D  96  114

E  91  114

In [12]: df -= df.min()  # equivalent to df = df - df.min()

In [13]: df /= df.max()  # equivalent to df = df / df.max()

In [14]: df

Out[14]:

a         b

A  0.000000  0.000000

B  0.926829  0.363636

C  0.926829  0.636364

D  1.000000  1.000000

E  0.939024  1.000000

To switch the order of a column (from 1 to 0 rather than 0 to 1):

In [15]: df['b'] = 1 - df['b']

An alternative method is to negate the b columns first (df['b'] = -df['b']).