0 votes
1 view
in Machine Learning by (13.4k points)

It is described in Mahout in Action that normalization can slightly improve the accuracy. Can anyone explain the reason, thanks!

1 Answer

0 votes
by (32.8k points)

Normalization is not required in some cases of machine learning, e.g. during clustering, we don't need normalization.

K-means:

K-means clustering is "isotropic" in all directions of space, which means it tends to produce more or less round clusters. In this situation leaving variances unequal is equivalent to putting more weight on variables with smaller variance.

For Example:

X = [randn(100,2)+ones(100,2);...

     randn(100,2)-ones(100,2)];

% Introduce denormalization

% X(:, 2) = X(:, 2) * 1000 + 500;

opts = statset('Display','final');

[idx,ctrs] = kmeans(X,2,...

                    'Distance','city',...

                    'Replicates',5,...

                    'Options',opts);

plot(X(idx==1,1),X(idx==1,2),'r.','MarkerSize',12)

hold on

plot(X(idx==2,1),X(idx==2,2),'b.','MarkerSize',12)

plot(ctrs(:,1),ctrs(:,2),'kx',...

     'MarkerSize',12,'LineWidth',2)

plot(ctrs(:,1),ctrs(:,2),'ko',...

     'MarkerSize',12,'LineWidth',2)

legend('Cluster 1','Cluster 2','Centroids',...

       'Location','NW')

title('K-means with normalization')

enter image description here

enter image description here

Hope this answer helps you!

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...