Back
What is the difference between the two? It seems that both create new columns, in which their number is equal to the number of unique categories in the feature. Then they assign 0 and 1 to data points depending on what category they are in.
Label Binarizer:
It assigns a unique value or number to each label in a categorical feature.
For example:
>>> from sklearn import preprocessing>>> lb = preprocessing.LabelBinarizer()>>> lb.fit([1, 2, 6, 4, 2])LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)>>> lb.classes_array([1, 2, 4, 6])>>> lb.transform([1, 6])array([[1, 0, 0, 0], [0, 0, 0, 1]])
>>> from sklearn import preprocessing
>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit([1, 2, 6, 4, 2])
LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
>>> lb.classes_
array([1, 2, 4, 6])
>>> lb.transform([1, 6])
array([[1, 0, 0, 0],
[0, 0, 0, 1]])
One Hot Encoding:
It encodes categorical integer features as a one-hot numeric array. It makes model training easier and faster.
from sklearn.preprocessing import OneHotEncoderenc = OneHotEncoder(handle_unknown='ignore')X = [['Male', 1], ['Female', 3], ['Female', 2]]enc.fit(X)
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')
X = [['Male', 1], ['Female', 3], ['Female', 2]]
enc.fit(X)
Hope this answer helps.
If you want to be build successful data science career then enroll for best data science certification.
31k questions
32.8k answers
501 comments
693 users