How to assign clusters to new observations (test data) using scipy's hierchical clustering

Question

asked Jul 5, 2019 in Data Science by sourav (17.6k points)

from scipy.cluster.hierarchy import dendrogram, linkage,fcluster
import numpy as np
import matplotlib.pyplot as plt
# data
np.random.seed(4711) # for repeatability of this tutorial
a = np.random.multivariate_normal([10, 0], [[3, 1], [1, 4]], size=[100,])
b = np.random.multivariate_normal([0, 20], [[3, 1], [1, 4]], size=[50,])
X = np.concatenate((a, b),)
plt.scatter(X[:,0], X[:,1])

enter image description here

# fit clusters
Z = linkage(X, method='ward', metric='euclidean', preserve_input=True)
# plot dendrogram

enter image description here

max_d = 50
clusters = fcluster(Z, max_d, criterion='distance')
# now if I have new data
a = np.random.multivariate_normal([10, 0], [[3, 1], [1, 4]], size=[10,])
b = np.random.multivariate_normal([0, 20], [[3, 1], [1, 4]], size=[5,])
X_test = np.concatenate((a, b),)
print(X_test.shape) # 150 samples with 2 dimensions
plt.scatter(X_test[:,0], X_test[:,1])
plt.show()

enter image description here

how to compute distances for the new data and assign clusters using clusters from training data?

code references:joernhees.de

1 Answer

Shlok Pandey · Answer 1 · 2019-07-05T06:57:15+0000

You do not have to compute distances for the new data and assign clusters using clusters from training data because clustering is an explorative approach it does not have training and test stages. For this algorithm, you cannot assign new data to the old structure as this new data can change the structure that is previously discovered.

You can use a classifier if you want classification.

If you want to learn data science in-depth then enroll for best data science training.

How to assign clusters to new observations (test data) using scipy's hierchical clustering

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources