Kmeans without knowing the number of clusters?

Question

1 Answer

Anurag · Answer 1 · 2019-06-19T09:06:47+0000

For your problem, if your dataset has a large number of features, then you can use Principal Component Analysis (PCA). It is a dimensionality reduction technique, which means it chooses highly co-related features based on the variance between them. Once you got k numbers of feature, you can define your k-means cluster value based on that.

Another approach to your problem is cross-validation. You choose a subset of data and k-means will make k number of clusters. Then you can compare clusters of a subset with the rest of the data.
K-means halts creating and optimizing clusters when either:

The centroids have stabilized — there is no change in their values because the clustering has been successful.
The defined number of iterations has been achieved.

For Example:

from sklearn.cluster import KMeans
Kmean = KMeans(n_clusters=2)
Kmean.fit(X)

Hope this answer helps.

Kmeans without knowing the number of clusters?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources