What is the relation between topic modeling and document clustering?

Question

1 Answer

Anurag · Answer 1 · 2019-06-25T10:04:25+0000

Topic modeling decides similarities between documents. The organization of a huge collection of documents is important. It can be done by topic modeling or by clustering. In topic modeling, a topic is a cluster of words with each word in the cluster having a probability of occurrence for a given topic, and different topics have their respective clusters of words along with corresponding probabilities. A document can have more than one topic associated with it due to more number of shared words. Topic modeling techniques reduce the feature dimensionality from a number of distinct words.

In clustering, the basic idea is to group documents into different groups based on some suitable similarity measure. For grouping, each document is represented by a vector representing the weights assigned to words in the document. It is common to perform weighting using the tf-idf(term frequency-inverse document frequency) scheme. Then the end result of clustering is a list of clusters with every document showing up in one of the clusters.

Thus, topic modeling or soft clustering is much similar to each other; the difference is how the problem is being approached. This approach means that topic modeling is a technique to do document clustering.

If you wish to Learn Machine Learning then visit this Machine Learning Course.

What is the relation between topic modeling and document clustering?

1 Answer

Related questions

Browse Categories