Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I'm currently doing this data science problem and I keep running into an issue when trying to loop through each tweet that's stored in a filtered list of tweets that's mean't to send them to a new function to be cleaned up further.

k1_tweets_filtered is just a list of tweets that's had any tweet that's less than 20 characters removed. What I'm trying to do now is send that list to another function to process further but its only doing 1 tweet. The list are occupied from searching twitter.

The issue is that its only doing it for the first tweet and nothing else. I need it to process every tweet in that list. Looking at the len of k1_tweets_filtered, its 512 then len of processed only shows 14. Maybe my loop is wrong?

Thank you for the help!

Code:

k1_tweets_processed = []

for tweet in k1_tweets_filtered:

    k1_tweets_processed = pre_process(tweet_k1)

def pre_process(doc):

    doc = doc.lower()

    # getting rid of non ascii codes

    doc = remove_non_ascii(doc)

    # replacing URLs

    url_pattern = "http://[^\s]+|https://[^\s]+|www.[^\s]+|[^\s]+\.com|bit.ly/[^\s]+"

    doc = re.sub(url_pattern, 'url', doc) 

    punctuation = r"\(|\)|#|\'|\"|-|:|\\|\/|!|\?|_|,|=|;|>|<|\.|\@"

    doc = re.sub(punctuation, ' ', doc)

    return [w for w in doc.split() if len(w) > 2]

It works fine for one tweet but I'm trying to send the entire list to it for every tweet in it to be processed properly. The final list should have every tweet processed properly instead of just the first 1.

1 Answer

0 votes
by (36.8k points)

After looking into the code I understood that you are giving the value of the list to function output which is not correct. you need to keep on adding into the list using the 'for' a loop as shown:

for tweet in k1_tweets_filtered:   

    k1_tweets_processed.append(pre_process(tweet))

I hope this will help you.

Improve your knowledge in data science from scratch using Data science online courses

Browse Categories

...