Explore Courses Blog Tutorials Interview Questions
0 votes
in AI and Deep Learning by (50.2k points)

The Wikipedia article on determining the number of clusters in a dataset indicated that I do not need to worry about such a problem when using hierarchical clustering. However, when I tried to use sci-kit-learn's agglomerative clustering I see that I have to feed it the number of clusters as a parameter "n_clusters" - without which I get the hardcoded default of two clusters. How can I go about choosing the right number of clusters for the dataset in this case? Is the wiki article wrong?

1 Answer

0 votes
by (108k points)

Following are the steps to perform Hierarchical Clustering involved in agglomerative clustering:

  • At the starting point, treat each data point as one cluster. Therefore, the number of clusters at the start will be K, while K is an integer representing the number of data points.

  • Form one cluster by joining the two closest data points resulting in K-1 clusters.

  • Form more clusters by joining the two closest clusters resulting in K-2 clusters.

  • Now just repeat the above three steps until one big cluster is formed.

  • Once a single cluster is formed, dendrograms are used to divide into multiple clusters depending upon the problem. We will study the concept of dendrogram in detail in an upcoming section.

If you wish to know more about K-means Clustering then visit this K-means Clustering Algorithm Tutorial.

Browse Categories