Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (120 points)
vocab_idx = imdb.get_word_index()
rev_idx = {v+3:k for k,v in vocab_idx.items()}
rev_idx[0] = '<padding_word>'
rev_idx[1] = '<start_word>'
rev_idx[2] = '<oov_word>'
rev_idx[3] = '<unk_word>

'def retrieve_sample(doc_id, rev_idx, X_train): 
 sample_review = ''
 # TODO: implement your solution here 
# End of implementation 
 return sample_review 
answers_out['sample_doc_{}'.format(random_id)] = retrieve_sample(random_id, rev_idx, X_train) 
answers_out['sample_doc_2'] = retrieve_sample(2, rev_idx, X_train)
print (answers_out['sample_doc_{}'.format(random_id)])
print (answers_out['sample_doc_2'])
How can I retrieve original form of training samples so I get this output
<start_word> this movie is one of the most underrated movie of its time when watching this movie your filled with action and when <oov_word> not really <oov_word> the humour is un matched brilliant writing for a movie that was made to give us a bloody mix of a game show where criminals are the <oov_word> and a near future where the general public all have a <oov_word> for blood also arnold doesn't let us down with some of his best one liners i don't want to spoil anything for you but i will tell you when arnold gives his i'll be back line he gets the best response of them all in this movie hope you enjoy this gem as much as i did <start_word> this has to be one of the worst films of the <oov_word> when my friends i were watching this film being the target audience it was aimed at we just sat watched the first half an hour with our jaws touching the floor at how bad it really was the rest of the time everyone else in the theatre just started talking to each other leaving or generally crying into their popcorn that they actually paid money they had <oov_word> working to watch this <oov_word> excuse for a film it must have looked like a great idea on paper but on film it looks like no one in the film has a clue what is going on crap acting crap costumes i can't get across how <oov_word> this is to watch save yourself an hour a bit of your life

6 Answers

0 votes
by (480 points)

To retrieve the original form of training samples, you can use the rev_idx dictionary that maps the integer-encoded words back to their original word forms. Here's an updated version of the retrieve_sample function that utilizes rev_idx to reconstruct the sample reviews:

def retrieve_sample(doc_id, rev_idx, X_train):

    sample_review = ''

    encoded_review = X_train[doc_id]

    for word_id in encoded_review:

        if word_id in rev_idx:

            word = rev_idx[word_id]

            sample_review += word + ' '

    return sample_review

answers_out['sample_doc_{}'.format(random_id)] = retrieve_sample(random_id, rev_idx, X_train)

answers_out['sample_doc_2'] = retrieve_sample(2, rev_idx, X_train)

print(answers_out['sample_doc_{}'.format(random_id)])

print(answers_out['sample_doc_2'])

By iterating over the word IDs in encoded_review, the function retrieves the corresponding words from rev_idx and appends them to the sample_review string. This way, you obtain the original form of the training samples.

0 votes
by (180 points)

To obtain the original form of the training samples, you can employ the rev_idx dictionary, which serves as a mapping between the integer-encoded words and their respective original word representations. By utilizing this dictionary, you can reconstruct the sample reviews to their original form.

Here is an retrieve_sample function that makes use of rev_idx to accomplish this task:

def retrieve_sample(doc_id, rev_idx, X_train): sample_review = '' encoded_review = X_train[doc_id] for word_id in encoded_review: if word_id in rev_idx: word = rev_idx[word_id] sample_review += word + ' ' return sample_review answers_out['sample_doc_{}'.format(random_id)] = retrieve_sample(random_id, rev_idx, X_train) answers_out['sample_doc_2'] = retrieve_sample(2, rev_idx, X_train) print(answers_out['sample_doc_{}'.format(random_id)]) print(answers_out['sample_doc_2'])

By iterating through the word IDs in encoded_review, the function retrieves the corresponding words from rev_idx and concatenates them to the sample_review string. This approach enables you to access the original form of the training samples, facilitating the desired output.

0 votes
by (180 points)

The rev_idx dictionary, which acts as a mapping between the integer-encoded words and their corresponding original word representations, can be used to acquire the training samples in their original form. You can restore the sample reviews to their original form by using this dictionary.

An improved version of the retrieve_sample function, which uses rev_idx to complete this task, is provided below:

def retrieve_sample(doc_id, rev_idx, X_train):

    sample_review = ''

    encoded_review = X_train[doc_id]

    for word_id in encoded_review:

        if word_id in rev_idx:

            word = rev_idx[word_id]

            sample_review += word + ' '

    return sample_review

answers_out['sample_doc_{}'.format(random_id)] = retrieve_sample(random_id, rev_idx, X_train)

answers_out['sample_doc_2'] = retrieve_sample(2, rev_idx, X_train)

print(answers_out['sample_doc_{}'.format(random_id)])

print(answers_out['sample_doc_2'])

The function extracts the matching words from rev_idx by iterating through the word IDs in encoded_review and concatenates them to the sample_review string. With this method, you can retrieve the training samples in their original format, facilitating the intended result.
0 votes
by (1.3k points)

To retrieve the original form of training samples, you can use the reverse index (rev_idx) and the X_train data. Here's an updated implementation of the retrieve_sample function:

def retrieve_sample(doc_id, rev_idx, X_train): sample_review = ' '.join([rev_idx.get(idx, '<unk_word>') for idx in X_train[doc_id]]) return sample_review

Explanation:

  1. The retrieve_sample function takes three arguments: doc_id, rev_idx, and X_train.
  2. It initializes an empty string sample_review to store the retrieved sample.
  3. It retrieves the document corresponding to doc_id from X_train using indexing: X_train[doc_id].
  4. It iterates over the indices in the document and uses the reverse index rev_idx to map each index back to its original word. If an index is not found in rev_idx, it uses the <unk_word> placeholder.
  5. The words are joined using the join method with a space as the separator, creating the sample_review string.
  6. Finally, the sample_review is returned.

You can use this updated retrieve_sample function in your code to retrieve the original form of the training samples and obtain the desired output.

0 votes
by (640 points)

def retrieve_sample(doc_id, rev_idx, X_train):

    original_sample = ''

    for index in X_train[doc_id]:

        word = rev_idx.get(index, '<unk_word>')

        original_sample += word + ' '

    return original_sample.strip()

Explanation:

  1. The retrieve_sample function takes three arguments: doc_id, rev_idx, and X_train.
  2. It initializes an empty string original_sample to store the retrieved sample.
  3. It retrieves the document corresponding to doc_id from X_train using indexing: X_train[doc_id].
  4. It iterates over the indices in the document.
  5. For each index, it uses the reverse index rev_idx to map the index back to its original word. If the index is not found in rev_idx, it uses the <unk_word> placeholder.
  6. The word is added to the original_sample string, followed by a space.
  7. Finally, the original_sample string is returned after stripping any trailing whitespace.

By using this rephrased version of the retrieve_sample function, you will be able to retrieve the original form of the training samples and obtain the desired output.

0 votes
by (300 points)

def retrieve_sample(doc_id, rev_idx, X_train):

    original_sample = ''

    for index in X_train[doc_id]:

        word = rev_idx.get(index, '<unk_word>')

        original_sample += word + ' '

    return original_sample.strip()

Explanation:

  1. The function retrieve_sample takes three arguments: doc_id, rev_idx, and X_train.
  2. It initializes an empty string original_sample to store the retrieved sample.
  3. The document corresponding to doc_id is retrieved from X_train using indexing: X_train[doc_id].
  4. A loop is used to iterate over the indices in the document.
  5. For each index, the reverse index rev_idx is used to map the index back to its original word. If the index is not found in rev_idx, the <unk_word> placeholder is used.
  6. The word is added to the original_sample string, followed by a space.
  7. Finally, the original_sample string is returned after removing any trailing whitespace.

By employing this reformulated version of the retrieve_sample function, you will be able to retrieve the original form of the training samples.

Browse Categories

...