Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (47.6k points)

Is it possible to have missing values in scikit-learn? How should they be represented? I couldn't find any documentation about that.

1 Answer

0 votes
by (33.1k points)

Use Imputer:

Imputer is used to fill missing values either in a particular column or in the complete dataset.

The default technique to fill missing value using an imputer is mean calculation, but you can change it by passing an argument named Strategy. It takes the following ways to fill missing values:

  • Mean:  It replaces missing values using the mean along each column. It can only be used with numeric data.

  • Median:  It replaces missing values using the median along each column. It can only be used with numeric data.

  • Most_frequent:  It replaces missing using the most frequent value along each column. It can be used with strings or numeric data.

  • Constant:  It replaces missing values with fill_value. It can be used with strings or numeric data.

For example:

from sklearn.impute import SimpleImputer

imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')

imp_mean.fit([[7, 2, 3], [4, np.nan, 6], [10, 5, 9]])

X = [[np.nan, 2, 3], [4, np.nan, 6], [10, np.nan, 9]]

print(imp_mean.transform(X))

Hope this answer helps.

If you wish to learn a Machine Learning visit, this Machine Learning Course.

Browse Categories

...