0 votes
1 view
in Data Science by (17.4k points)

What is the difference between the two? It seems that both create new columns, in which their number is equal to the number of unique categories in the feature. Then they assign 0 and 1 to data points depending on what category they are in.

1 Answer

0 votes
by (33.2k points)

Label Binarizer:

It assigns a unique value or number to each label in a categorical feature.

For example:

>>> from sklearn import preprocessing

>>> lb = preprocessing.LabelBinarizer()

>>> lb.fit([1, 2, 6, 4, 2])

LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)

>>> lb.classes_

array([1, 2, 4, 6])

>>> lb.transform([1, 6])

array([[1, 0, 0, 0],

       [0, 0, 0, 1]])

One Hot Encoding:

It encodes categorical integer features as a one-hot numeric array. It makes model training easier and faster. 

For example:

from sklearn.preprocessing import OneHotEncoder

enc = OneHotEncoder(handle_unknown='ignore')

X = [['Male', 1], ['Female', 3], ['Female', 2]]


Hope this answer helps.

If you want to be build successful data science career then enroll for best data science certification.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !