I have fitted a CountVectorizer to some documents in scikit-learn. I would like to see all the terms and their corresponding frequency in the text corpus, in order to select stop-words. For example

'and' 123 times, 'to' 100 times, 'for' 90 times, ... and so on

Is there any built-in function for this?

If cv is your CountVectorizer and X is the vectorized corpus, then



returns a list of (term, frequency) pairs for each distinct term in the corpus that the CountVectorizer extracted.

