Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)

Can anyone point me to a hierarchical clustering tool (preferable in python) that can cluster ~1 Million objects? I have tried hcluster and also Orange.

hcluster had trouble with 18k objects. Orange was able to cluster 18k objects in seconds, but failed with 100k objects (saturated memory and eventually crashed).

I am running on a 64bit Xeon CPU (2.53GHz) and 8GB of RAM + 3GB swap on Ubuntu 11.10.

1 Answer

0 votes
by (33.1k points)

In order to overcome the complexity of O(n^2), you should reduce your data (documents) 

Two possible approaches are:

  • Build a hierarchical tree from 15k points, then add the rest one by one: time ~ 1M * treedepth

  • first, build 100 or 1000 flat clusters, then build your hierarchical tree of the 100 or 1000 cluster centers.

Hope this answer helps. For more details, study Machine Learning Online Course. Also, studying Reinforcement Learning could give you an idea about Hierarchial clustering. 

Browse Categories