The rev_idx dictionary, which acts as a mapping between the integer-encoded words and their corresponding original word representations, can be used to acquire the training samples in their original form. You can restore the sample reviews to their original form by using this dictionary.
An improved version of the retrieve_sample function, which uses rev_idx to complete this task, is provided below:
def retrieve_sample(doc_id, rev_idx, X_train):
sample_review = ''
encoded_review = X_train[doc_id]
for word_id in encoded_review:
if word_id in rev_idx:
word = rev_idx[word_id]
sample_review += word + ' '
return sample_review
answers_out['sample_doc_{}'.format(random_id)] = retrieve_sample(random_id, rev_idx, X_train)
answers_out['sample_doc_2'] = retrieve_sample(2, rev_idx, X_train)
print(answers_out['sample_doc_{}'.format(random_id)])
print(answers_out['sample_doc_2'])
The function extracts the matching words from rev_idx by iterating through the word IDs in encoded_review and concatenates them to the sample_review string. With this method, you can retrieve the training samples in their original format, facilitating the intended result.