Which algorithm and what combination of hyper-parameters will be the best to cluster this data?

Question

1 Answer

Anurag · Answer 1 · 2019-06-20T12:23:16+0000

For clustering algorithms, data preprocessing is an important part. We should improve our data by preprocessing if the clusters are not being formed properly. We should standardize our data carefully because k-means is very sensitive to scaling.

GMM forms clusters with a certain probability. So it can be unstable sometimes.

You can surely try the Density-Based clustering algorithm (DBSCAN) here. It forms a particular cluster around the densest areas.

For example:

>from sklearn.cluster import DBSCAN
>import numpy as np
>X = np.array([[1, 2], [2, 2], [2, 3],...
... [8, 7], [8, 8], [25, 80]])
>clustering = DBSCAN(eps=3, min_samples=2).fit(X)
>clustering.labels_
array([ 0, 0, 0, 1, 1, -1])
>clustering
DBSCAN(algorithm='auto', eps=3, leaf_size=30, metric='euclidean', metric_params=None, min_samples=2, n_jobs=None, p=None)

Which algorithm and what combination of hyper-parameters will be the best to cluster this data?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources