Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

 I have a list named messages.

messages = ['hey how are you', 'doing good what about you']

My end goal is to run the list against other list of vocabulary, and if each word is in my vocab list, put it in other list. This vocabulary list should look like this:

vocab = ['hey', 'how', 'you']

(Notice 'are' is omitted)

The final list of my formatted data is right now looks like this:

final_list = np.array([['', '', '', ''], ['', '', '', '']])

I want it to look something like this:

final_list = np.array([['hey', 'how', 'you', ''], ['you', '', '', '']])

I have an idea using the for loop and enumerate(), but it's not working well.

1 Answer

0 votes
by (36.8k points)
edited by

The list of messages. For each message, split it into the words, take at most N (N=4) words, and pad with empty strings, if needed.

N = 4

data = []

for m in messages:

    words = [x for x in m.split() if x in vocab]

    data.append(words[:N] + (N - len(words)) * [""])

final_list = np.array(data)

 For the better performance, convert the vocab to a set before the loop:

vocab = set(vocab)

Result:

array([['hey', 'how', 'you', ''],

       ['you', '', '', '']], dtype='<U3')

 Want to gain skills in Data Science with Python? Sign up today for this Data Science with Python Course and be a master in it

Browse Categories

...