Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (19.9k points)

I want to make a list of sentences from a string and then print them out. I don't want to use NLTK to do this. So it needs to split on a period at the end of the sentence and not at decimals or abbreviations or title of a name or if the sentence has a .com This is attempt at regex that doesn't work.

import re

text = """\

Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it. Did he mind? Adam Jones Jr. thinks he didn't. In any case, this isn't true... Well, with a probability of .9 it isn't.

"""

sentences = re.split(r' *[\.\?!][\'"\)\]]* *', text)

for stuff in sentences:

        print(stuff)    

Example output of what it should look like

Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it. 

Did he mind?

Adam Jones Jr. thinks he didn't.

In any case, this isn't true...

Well, with a probability of .9 it isn't.

1 Answer

0 votes
by (25.1k points)

You need to use the following regex.

(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s

Related questions

0 votes
1 answer
asked Jan 14, 2020 in Python by Rajesh Malhotra (19.9k points)
0 votes
1 answer
+1 vote
1 answer
0 votes
1 answer

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...