Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (45.3k points)

Given these 3 lists of data and a list of keywords:

good_data1 = ['hello, world', 'hey, world']

good_data2 = ['hey, man', 'whats up']

bad_data = ['hi, earth', 'sup, planet']

keywords = ['world', 'he']

I'm trying to write a simple function to check if any of the keywords exist as a substring of any word in the data lists. It should return True for the good_data lists and False for bad_data.

I know how to do this in what seems to be an inefficient way:

def checkData(data):

  for s in data:

    for k in keywords:

      if k in s:

        return True

  return False

1 Answer

0 votes
by (16.8k points)

In your example, with so few items, it doesn't really matter. But if you have a list of several thousand items, this might help.

Since you don't care which element in the list contains the keyword, you can scan the whole list once (as one string) instead of one item at the time. For that you need a join character that you know won't occur in the keyword, in order to avoid false positives. I use the newline in this example.

def check_data(data):

    s = "\n".join(data);

    for k in keywords:

        if k in s:

            return True

    return False

In my completely unscientific test, my version checked a list of 5000 items 100000 times in about 30 seconds. I stopped your version after 3 minutes -- got tired of waiting to post =)

Related questions

0 votes
4 answers
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Browse Categories

...