0 votes
1 view
in Data Science by (17.6k points)

from scipy.cluster.hierarchy import dendrogram, linkage,fcluster

import numpy as np

import matplotlib.pyplot as plt

# data

np.random.seed(4711)  # for repeatability of this tutorial

a = np.random.multivariate_normal([10, 0], [[3, 1], [1, 4]], size=[100,])

b = np.random.multivariate_normal([0, 20], [[3, 1], [1, 4]], size=[50,])

X = np.concatenate((a, b),)

plt.scatter(X[:,0], X[:,1])

enter image description here

# fit clusters

Z = linkage(X, method='ward', metric='euclidean', preserve_input=True)

# plot dendrogram

enter image description here

max_d = 50

clusters = fcluster(Z, max_d, criterion='distance')

# now if I have new data

a = np.random.multivariate_normal([10, 0], [[3, 1], [1, 4]], size=[10,])

b = np.random.multivariate_normal([0, 20], [[3, 1], [1, 4]], size=[5,])

X_test = np.concatenate((a, b),)

print(X_test.shape)  # 150 samples with 2 dimensions

plt.scatter(X_test[:,0], X_test[:,1])

plt.show()

enter image description here

how to compute distances for the new data and assign clusters using clusters from training data?

code references:joernhees.de

1 Answer

0 votes
by (38.4k points)

 You do not have to compute distances for the new data and assign clusters using clusters from training data because clustering is an explorative approach it does not have training and test stages. For this algorithm, you cannot assign new data to the old structure as this new data can change the structure that is previously discovered.

You can use a classifier if you want classification.

If you want to learn data science in-depth then enroll for best data science training.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...