Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)

Topic modeling identifies the distribution of topics in a document collection, which effectively identifies the clusters in the collection. So is it right to say that topic modeling is a technique to do document clustering?

1 Answer

0 votes
by (33.1k points)

Topic modeling decides similarities between documents. The organization of a huge collection of documents is important. It can be done by topic modeling or by clustering. In topic modeling, a topic is a cluster of words with each word in the cluster having a probability of occurrence for a given topic, and different topics have their respective clusters of words along with corresponding probabilities. A document can have more than one topic associated with it due to more number of shared words. Topic modeling techniques reduce the feature dimensionality from a number of distinct words.

In clustering, the basic idea is to group documents into different groups based on some suitable similarity measure. For grouping, each document is represented by a vector representing the weights assigned to words in the document. It is common to perform weighting using the tf-idf(term frequency-inverse document frequency) scheme. Then the end result of clustering is a list of clusters with every document showing up in one of the clusters.

Thus, topic modeling or soft clustering is much similar to each other; the difference is how the problem is being approached. This approach means that topic modeling is a technique to do document clustering.

If you wish to Learn Machine Learning then visit this Machine Learning Course.

Browse Categories