I am using Scikit-learn for text classification. I want to calculate the Information Gain for each attribute with respect to a class in a (sparse) document-term matrix. The Information Gain is defined as H(Class) - H(Class | Attribute), where H is the entropy.

Using weka, this can be accomplished with the InfoGainAttribute. But I haven't found this measure in scikit-learn.

However, it has been suggested that the formula above for Information Gain is the same measure as mutual information. This matches also the definition in wikipedia.

Is it possible to use a specific setting for mutual information in scikit-learn to accomplish this task?