I have a pandas dataframe with mixed type columns, and I'd like to apply sklearn's min_max_scaler to some of the columns. Ideally, I'd like to do these transformations in place, but haven't figured out a way to do that yet. I've written the following code that works:

import pandas as pd

import numpy as np

from sklearn import preprocessing

scaler = preprocessing.MinMaxScaler()

dfTest = pd.DataFrame({'A':[14.00,90.20,90.95,96.27,91.21],'B':[103.02,107.26,110.35,114.23,114.68], 'C':['big','small','big','small','small']})

min_max_scaler = preprocessing.MinMaxScaler()

def scaleColumns(df, cols_to_scale):

    for col in cols_to_scale:

        df[col] = pd.DataFrame(min_max_scaler.fit_transform(pd.DataFrame(dfTest[col])),columns=[col])

    return df


    A   B   C

0    14.00   103.02  big

1    90.20   107.26  small

2    90.95   110.35  big

3    96.27   114.23  small

4    91.21   114.68  small

scaled_df = scaleColumns(dfTest,['A','B'])


A   B   C

0    0.000000    0.000000    big

1    0.926219    0.363636    small

2    0.935335    0.628645    big

3    1.000000    0.961407    small

4    0.938495    1.000000    small

I'm curious if this is the preferred/most efficient way to do this transformation. Is there a way I could use df.apply that would be better?

I'm also surprised I can't get the following code to work:

bad_output = min_max_scaler.fit_transform(dfTest['A'])

If I pass an entire dataframe to the scaler it works:

dfTest2 = dfTest.drop('C', axis = 1)

good_output = min_max_scaler.fit_transform(dfTest2)


I'm confused why passing a series to the scaler fails. In my full working code above I had hoped to just pass a series to the scaler then set the dataframe column = to the scaled series. I've seen this question asked a few other places, but haven't found a good answer. Any help understanding what's going on here would be greatly appreciated!

This following snippet works perfectly and produces exact output without having to use apply.

import pandas as pd

from sklearn.preprocessing import MinMaxScaler


scaler = MinMaxScaler()

dfTest = pd.DataFrame({'A':[14.00,90.20,90.95,96.27,91.21],



dfTest[['A', 'B']] = scaler.fit_transform(dfTest[['A', 'B']])


          A         B C

0  0.000000  0.000000   big

1  0.926219  0.363636 small

2  0.935335  0.628645   big

3  1.000000  0.961407 small

4  0.938495  1.000000 small

