Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Machine Learning by (19k points)

Topic modeling identifies the distribution of topics in a document collection, which effectively identifies the clusters in the collection. So is it right to say that topic modeling is a technique to do document clustering?

1 Answer

0 votes
by (33.1k points)

Topic modeling decides similarities between documents. The organization of a huge collection of documents is important. It can be done by topic modeling or by clustering. In topic modeling, a topic is a cluster of words with each word in the cluster having a probability of occurrence for a given topic, and different topics have their respective clusters of words along with corresponding probabilities. A document can have more than one topic associated with it due to more number of shared words. Topic modeling techniques reduce the feature dimensionality from a number of distinct words.

In clustering, the basic idea is to group documents into different groups based on some suitable similarity measure. For grouping, each document is represented by a vector representing the weights assigned to words in the document. It is common to perform weighting using the tf-idf(term frequency-inverse document frequency) scheme. Then the end result of clustering is a list of clusters with every document showing up in one of the clusters.

Thus, topic modeling or soft clustering is much similar to each other; the difference is how the problem is being approached. This approach means that topic modeling is a technique to do document clustering.

If you wish to Learn Machine Learning then visit this Machine Learning Course.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers


94.1k users

Browse Categories