I am trying to extract all occurrences of tagged words from a string using regex in Python 2.7.2. Or simply, I want to extract every piece of text inside the [p][/p] tags. Here is my attempt:
regex = ur"[\u005B1P\u005D.+?\u005B\u002FP\u005D]+?"
line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday."
person = re.findall(pattern, line)
Printing person produces ['President [P]', '[/P]', '[P] Bill Gates [/P]']
What is the correct regex to get: ['[P] Barack Obama [/P]', '[P] Bill Gates [/p]'] or ['Barrack Obama', 'Bill Gates'].
Thanks. :)