I want to split a text format data into n-grams. Usually, I would do something like:
import nltk
from nltk import bigrams
string = "I really like python, it's pretty awesome."
string_bigrams = bigrams(string)
print string_bigrams
I am aware that nltk only offers bigrams and trigrams, but is there a way to split my text into four-grams, five-grams, or even hundred-grams?