Pointwise mutual information on text

Question

1 Answer

JaneShaw · Answer 1 · 2019-07-31T04:46:41+0000

PMI could be a measure of association between a feature (in your case a word) and a category (category), not between a document (tweet) and a class.

P(x, y)
pmi(x ,y) = log ----------------
P(x)P(y)

In that formula, X is the random variable that models the occurrence of a word, and Y models the occurrence of a class. For a given word x and a given class y, you can use PMI to decide if a feature is informative or not, and you can do feature selection on that basis. Having fewer features typically improves the performance of your classification algorithmic rule and speeds it up significantly. The classification step, however, is separate- PMI only helps you select better features to feed into your learning algorithm.

Edit: One issue I did not mention within the original post is that PMI is sensitive to word frequencies. Let's rewrite the formula as

P(x, y)
pmi(x ,y) = log ------------
P(x)P(y)

When x and y are perfectly correlated, P(x|y) = P(y|x) = 1, so PMI(x,y) = 1/P(x). Less frequent x-es (words) will have a higher PMI score than frequent x-es, even if both are perfectly correlated with y.

For more details on this, check the Machine Learning Online Course along with Machine Learning Algorithms.

Pointwise mutual information on text

Pointwise mutual information on text

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions