Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Machine Learning by (19k points)

I encoded my categorical data using sklearn.OneHotEncoder and fed them to a random forest classifier. Everything seems to work and I got my predicted output back.

Is there a way to reverse the encoding and convert my output back to its original state?

1 Answer

0 votes
by (33.1k points)

Just compute the dot-product of the encoded values with enc.active_features_. It would work both for sparse and dense representation. 

For example:

from sklearn.preprocessing import OneHotEncoder

import numpy as np

orig = np.array([6, 9, 8, 2, 5, 4, 5, 3, 3, 6])

enc = OneHotEncoder()

encoded = enc.fit_transform(orig.reshape(-1, 1)) 

decoded =

assert np.allclose(orig, decoded)

The key insight is that the active_features_ attribute of the OHE model, that represents the original values for each binary column. Thus we can decode the binary-encoded number by simply computing a dot-product with active_features_. For each data point, there's just a single 1 the position of the original value.

Since Machine Learning features as one of the parent domains of Scikit Learn Cheat Sheet, learning the domain would bring an enormous amount of knowledge to the student as a whole.

Browse Categories