Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in Machine Learning by (19k points)

It is described in Mahout in Action that normalization can slightly improve the accuracy. Can anyone explain the reason, thanks!

1 Answer

0 votes
by (33.1k points)

Normalization is not required in some cases of machine learning, e.g. during clustering, we don't need normalization.

K-means:

K-means clustering is "isotropic" in all directions of space, which means it tends to produce more or less round clusters. In this situation leaving variances unequal is equivalent to putting more weight on variables with smaller variance.

For Example:

X = [randn(100,2)+ones(100,2);...

     randn(100,2)-ones(100,2)];

% Introduce denormalization

% X(:, 2) = X(:, 2) * 1000 + 500;

opts = statset('Display','final');

[idx,ctrs] = kmeans(X,2,...

                    'Distance','city',...

                    'Replicates',5,...

                    'Options',opts);

plot(X(idx==1,1),X(idx==1,2),'r.','MarkerSize',12)

hold on

plot(X(idx==2,1),X(idx==2,2),'b.','MarkerSize',12)

plot(ctrs(:,1),ctrs(:,2),'kx',...

     'MarkerSize',12,'LineWidth',2)

plot(ctrs(:,1),ctrs(:,2),'ko',...

     'MarkerSize',12,'LineWidth',2)

legend('Cluster 1','Cluster 2','Centroids',...

       'Location','NW')

title('K-means with normalization')

enter image description here

enter image description here

Hope this answer helps you! Also for more, check out the Machine Learning Courses offered by Intellipaat.

Studying Machine Learning Tutorial will also give some insights.

Browse Categories

...