Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Machine Learning by (19k points)

I have a large set of vectors in 3 dimensions. I need to cluster these based on Euclidean distance such that all the vectors in any particular cluster have a Euclidean distance between each other less than a threshold "T".

I do not know how many clusters exist. In the end, there may be individual vectors existing that are not part of any cluster because its Euclidean distance is not less than "T" with any of the vectors in the space.

What existing algorithms/approaches should be used here?

1 Answer

0 votes
by (33.1k points)

You can simply use Hierarchical Clustering for this problem. 

Code:

import matplotlib.pyplot as plt

import numpy

import scipy.cluster.hierarchy as hcluster

# generate 3 clusters of each around 100 points and one orphan point

N=100

data = numpy.random.randn(3*N,2)

data[:N] += 5

data[-N:] += 10

data[-1:] -= 20

# clustering

thresh = 1.5

clusters = hcluster.fclusterdata(data, thresh, criterion="distance")

# plotting

plt.scatter(*numpy.transpose(data), c=clusters)

plt.axis("equal")

title = "threshold: %f, number of clusters: %d" % (thresh, len(set(clusters)))

plt.title(title)

plt.show()

The above for hierarchical clustering will form clusters as shown in this image:

image

There is a threshold given as a parameter, is a distance value on which basis the decision is made so that data points/clusters will be merged into another cluster. The distance metric is used for clustering.

Hope this answer helps.

Visit here to know more about Supervised and Unsupervised Machine Learning.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers

500 comments

94k users

Browse Categories

...