Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)

I want to impute all of the columns on a pandas DataFrame...the only way I can think of doing this is column by column as shown below...

Is there an operation where I can impute the entire DataFrame without iterating through the columns?


from sklearn.preprocessing import Imputer

import numpy as np

import pandas as pd


fill_NaN = Imputer(missing_values=np.nan, strategy='mean', axis=1)

#Model 1

DF = pd.DataFrame([[0,1,np.nan],[2,np.nan,3],[np.nan,2,5]])

DF.columns = "c1.c2.c3".split(".")

DF.index = "i1.i2.i3".split(".")

#Impute Series

imputed_DF = DF

for col in DF.columns:

    imputed_column = fill_NaN.fit_transform(DF[col]).T

    #Fill in Series on DataFrame

    imputed_DF[col] = imputed_column


#c1  c2  c3

#i1   0   1 NaN

#i2   2 NaN   3

#i3 NaN   2   5


#c1   c2  c3

#i1   0  1.0   4

#i2   2  1.5   3

#i3   1  2.0   5

1 Answer

0 votes
by (33.1k points)

If you want the mean or median of a data feature, then you could do something like:

fill_NaN = Imputer(missing_values=np.nan, strategy='mean', axis=1)

imputed_DF = pd.DataFrame(fill_NaN.fit_transform(DF))

imputed_DF.columns = DF.columns

imputed_DF.index = DF.index

You can simply fill NAN values with 0s or something you could always just do:

DF[DF.isnull()] = 0

Hope this answer helps you! For more details, undergo the Machine Learning Online Course.

Browse Categories