Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (47.6k points)

I am trying to extract all occurrences of tagged words from a string using regex in Python 2.7.2. Or simply, I want to extract every piece of text inside the [p][/p] tags. Here is my attempt:

regex = ur"[\u005B1P\u005D.+?\u005B\u002FP\u005D]+?" 

line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday." 

person = re.findall(pattern, line)

Printing person produces ['President [P]', '[/P]', '[P] Bill Gates [/P]']

What is the correct regex to get: ['[P] Barack Obama [/P]', '[P] Bill Gates [/p]'] or ['Barrack Obama', 'Bill Gates'].

Thanks. :)

1 Answer

0 votes
by (106k points)

You can use the below-mentioned code to findall regex:-

import re regex = ur"\[P\] (.+?) \[/P\]+?" 

line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday." 

person = re.findall(regex, line) 

print(person)

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
2 answers
asked Sep 12, 2019 in Python by Sammy (47.6k points)
+1 vote
1 answer

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...