I'm currently doing this data science problem and I keep running into an issue when trying to loop through each tweet that's stored in a filtered list of tweets that's mean't to send them to a new function to be cleaned up further.
k1_tweets_filtered is just a list of tweets that's had any tweet that's less than 20 characters removed. What I'm trying to do now is send that list to another function to process further but its only doing 1 tweet. The list are occupied from searching twitter.
The issue is that its only doing it for the first tweet and nothing else. I need it to process every tweet in that list. Looking at the len of k1_tweets_filtered, its 512 then len of processed only shows 14. Maybe my loop is wrong?
Thank you for the help!
Code:
k1_tweets_processed = []
for tweet in k1_tweets_filtered:
k1_tweets_processed = pre_process(tweet_k1)
def pre_process(doc):
doc = doc.lower()
# getting rid of non ascii codes
doc = remove_non_ascii(doc)
# replacing URLs
url_pattern = "http://[^\s]+|https://[^\s]+|www.[^\s]+|[^\s]+\.com|bit.ly/[^\s]+"
doc = re.sub(url_pattern, 'url', doc)
punctuation = r"\(|\)|#|\'|\"|-|:|\\|\/|!|\?|_|,|=|;|>|<|\.|\@"
doc = re.sub(punctuation, ' ', doc)
return [w for w in doc.split() if len(w) > 2]
It works fine for one tweet but I'm trying to send the entire list to it for every tweet in it to be processed properly. The final list should have every tweet processed properly instead of just the first 1.