Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (19k points)

What is the difference between the two? It seems that both create new columns, in which their number is equal to the number of unique categories in the feature. Then they assign 0 and 1 to data points depending on what category they are in.

1 Answer

0 votes
by (33.1k points)
edited by

Label Binarizer:

It assigns a unique value or number to each label in a categorical feature.

For example:

>>> from sklearn import preprocessing

>>> lb = preprocessing.LabelBinarizer()

>>> lb.fit([1, 2, 6, 4, 2])

LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)

>>> lb.classes_

array([1, 2, 4, 6])

>>> lb.transform([1, 6])

array([[1, 0, 0, 0],

       [0, 0, 0, 1]])

One Hot Encoding:

It encodes categorical integer features as a one-hot numeric array. It makes model training easier and faster. 

For example:

from sklearn.preprocessing import OneHotEncoder

enc = OneHotEncoder(handle_unknown='ignore')

X = [['Male', 1], ['Female', 3], ['Female', 2]]

enc.fit(X)

Hope this answer helps.

If you want to be build successful data science career then enroll for best data science certification.

Browse Categories

...