+1 vote
2 views
I have noticed that when One Hot encoding is used on a particular data set (a matrix) and used as training data for learning algorithms, it gives significantly better results with respect to prediction accuracy as compared to using the original matrix itself as training data. One-hot encoding also increases the performance, how does it happen?

by (10.9k points)

One hot encoding can be defined as a process of converting categorical variables into a form that could be provided to ML algorithms to do a better job in prediction. One-hot encoding can be applied to the integer representation, it is used to replace the integer encoded variable and a new binary variable is added for each unique integer value. There are many learning algorithms that learn a single weight per feature or use the distance between the given samples.

Let’s assume you have a dataset containing a categorical feature “nationality” with values German, French, Russian and assume they are encoded as 0,1 and 2. You also have a weight for this feature in a linear classifier and this will make some decisions based on the constraint w×x + b > 0 or w×x < b. Now, the problem is weight cannot encode a three-way choice so we can use the one-hot encoding which blows up the feature space to three features with each having their own weight:w[GER]x[GER] + w[FR]x[FR] + w[RUS]x[RUS] < b, here all the x’s are booleans.

Similarly, A learner based on standard distance metrics (such as k-nearest neighbors) between samples will get confused without one-hot encoding. According to naive encoding and Euclidean distance, the distance between the French and US is 1 and the distance between the US and the UK is 2. Now, with the one-hot encoding, the pairwise distances between [1, 0, 0], [0, 1, 0] and [0, 0, 1] all becomes equal to √2.

Also, if you are looking to join a comprehensive course on ML, then you can join Machine learning Certification classes.