# Why vector normalization can improve the accuracy of clustering and classification?

1 view

It is described in Mahout in Action that normalization can slightly improve the accuracy. Can anyone explain the reason, thanks!

by (33.2k points)

Normalization is not required in some cases of machine learning, e.g. during clustering, we don't need normalization.

K-means:

K-means clustering is "isotropic" in all directions of space, which means it tends to produce more or less round clusters. In this situation leaving variances unequal is equivalent to putting more weight on variables with smaller variance.

For Example:

X = [randn(100,2)+ones(100,2);...

randn(100,2)-ones(100,2)];

% Introduce denormalization

% X(:, 2) = X(:, 2) * 1000 + 500;

opts = statset('Display','final');

[idx,ctrs] = kmeans(X,2,...

'Distance','city',...

'Replicates',5,...

'Options',opts);

plot(X(idx==1,1),X(idx==1,2),'r.','MarkerSize',12)

hold on

plot(X(idx==2,1),X(idx==2,2),'b.','MarkerSize',12)

plot(ctrs(:,1),ctrs(:,2),'kx',...

'MarkerSize',12,'LineWidth',2)

plot(ctrs(:,1),ctrs(:,2),'ko',...

'MarkerSize',12,'LineWidth',2)

legend('Cluster 1','Cluster 2','Centroids',...

'Location','NW')

title('K-means with normalization')

Hope this answer helps you! Also for more, check out the Machine Learning Courses offered by Intellipaat.

Studying Machine Learning Tutorial will also give some insights.