Is it possible to specify your own distance function using scikit-learn K-Means Clustering?

Question

3 Answers

Shrutiparna · Answer 1 · 2019-05-28T10:47:56+0000

@Rony, Yes, it is possible to specify your own distance function.

K-means clustering is one of the most widely used unsupervised machine learning algorithms which generates clusters of data based on the similarity between various data instances. K-means algorithm starts by randomly choosing a centroid value for each cluster. After that the algorithm iteratively performs three steps:

1. Find the Euclidean distance between each data instance and centroids of all the clusters

2. Assign the data instances to the cluster of the centroid with the nearest distance

3. Calculate new centroid values based on the mean values of the coordinates of all the data instances from the corresponding cluster.

Use this example to understand how to specify your own distance function-

from pyclustering.cluster.kmeans import kmeans
from pyclustering.utils.metric import type_metric, distance_metric
myfunc = lambda point1, point2: point1[0] + point2[0] + 2
metric = distance_metric(type_metric.USER_DEFINED, func=myfunc)
scenters = [[2.9, 3.4], [5.9, 6.7]];
kinstance = kmeans(sample, start_centers, metric=metric)
kinstance.process()
clstr = kinstance.get_clusters()

Shlok Pandey · Answer 2 · 2019-09-18T14:18:35+0000

You just need to use nltk instead where you can do this:-

from nltk.cluster.kmeans import KMeansClusterer
NUM_CLUSTERS = <choose a value>
data = <sparse matrix that you would normally give to scikit>.toarray()
kclusterer = KMeansClusterer(NUM_CLUSTERS, distance=nltk.cluster.util.cosine_distance, repeats=25)
assigned_clusters = kclusterer.cluster(data, assign_clusters=True)

You can use the following video tutorials to clear all your doubts:-

rkvepari · Answer 3 · 2019-09-19T07:01:09+0000

Yes, it's is possible to specify own distance using scikit-learn

K-Means Clustering , which is a technique to partition the dataset into unique homogeneous clusters which are similar to each other but different than other clusters ,resultant clusters mutual exclusive i.e non-overlapping clusters .

this technique partition the observations based on the distance from cluster centroid . this distance can be Euclidean Distance , i never used my own distance from scikit-learn K-Means.

Is it possible to specify your own distance function using scikit-learn K-Means Clustering?

3 Answers

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources