2 views

I was learning about non-linear clustering algorithms and I came across this 2-D graph. I was wondering which clustering algorithm and combination of hyper-parameters will cluster this data well.

Just like a human will cluster those 5 spikes. I want my algorithm to do it. I tried KMeans but it was only clustering horizontally or vertically. I started using GMM but couldn't get the hyper-parameters right for the desired clustering.

by (33.1k points)

For clustering algorithms, data preprocessing is an important part. We should improve our data by preprocessing if the clusters are not being formed properly. We should standardize our data carefully because k-means is very sensitive to scaling.

GMM forms clusters with a certain probability. So it can be unstable sometimes.

You can surely try the Density-Based clustering algorithm (DBSCAN) here. It forms a particular cluster around the densest areas.

For example:

>from sklearn.cluster import DBSCAN

>import numpy as np

>X = np.array([[1, 2], [2, 2], [2, 3],...

... [8, 7], [8, 8], [25, 80]])

>clustering = DBSCAN(eps=3, min_samples=2).fit(X)

>clustering.labels_

array([ 0,  0, 0, 1, 1, -1])

>clustering

DBSCAN(algorithm='auto', eps=3, leaf_size=30, metric='euclidean', metric_params=None, min_samples=2, n_jobs=None, p=None)