I would like to create a corpus/vocabulary made by all the texts (tokenised) within a column in my data frame:
User Text
312 Include details about your goal
41 Describe expected and actual results
421 Include any error messages
What I would like to do is to remove first the stopwords, then appending all the tokenised word into a list, i.e.:
my_list=['Include', 'details', 'goal', 'Describe', 'expected', 'actual', 'results', 'Include', 'error', 'messages']
I tried as follows:
df['Text'].apply(lambda x: [item for item in x if item not in stop_words])
but it gives me character, not words.