One hot encoding can be defined as a process of converting categorical variables into a form that could be provided to ML algorithms to do a better job in prediction. One-hot encoding can be applied to the integer representation, it is used to replace the integer encoded variable and a new binary variable is added for each unique integer value. There are many learning algorithms that learn a single weight per feature or use the distance between the given samples.
Let’s assume you have a dataset containing a categorical feature “nationality” with values German, French, Russian and assume they are encoded as 0,1 and 2. You also have a weight for this feature in a linear classifier and this will make some decisions based on the constraint w×x + b > 0 or w×x < b. Now, the problem is weight cannot encode a three-way choice so we can use the one-hot encoding which blows up the feature space to three features with each having their own weight:w[GER]x[GER] + w[FR]x[FR] + w[RUS]x[RUS] < b, here all the x’s are booleans.
Similarly, A learner based on standard distance metrics (such as k-nearest neighbors) between samples will get confused without one-hot encoding. According to naive encoding and Euclidean distance, the distance between the French and US is 1 and the distance between the US and the UK is 2. Now, with the one-hot encoding, the pairwise distances between [1, 0, 0], [0, 1, 0] and [0, 0, 1] all becomes equal to √2.
I hope this answer helps.
Also, if you are looking to join a comprehensive course on ML, then you can join Machine learning Certification classes.