Restore original text from Keras’s IMDb dataset
I want to restore IMDb's original text from Keras’s IMDB dataset.
First, when I load Keras’s IMDB dataset, it returned a sequence of word index.
>>> (X_train, y_train), (X_test, y_test) = imdb.load_data() >>> X_train[0]
I found imdb.get_word_index method(), it returns word index dictionary like {‘create’: 984, ‘make’: 94,…}. For converting, I create index word dictionary.
>>> word_index = imdb.get_word_index()
>>> index_word = {v:k for k,v in word_index.items()}
Then, I tried to restore the original text like the following.
>>> ' '.join(index_word.get(w) for w in X_train[5])
"the effort still been that usually makes for of finished sucking ended cbc's an because before if just though something know novel female i i slowly lot of above freshened with connect in of script their that out end his deceptively i i"
I’m not good at English, but I know this sentence is something strange.
Why does this happen? How can I restore the original text?