Explore Courses Blog Tutorials Interview Questions
0 votes
in Python by (47.6k points)

I am doing some text normalization using python and regular expressions. I would like to substitute all 'u'or 'U's with 'you'. Here is what I have done so far:

import re 

text = 'how are u? umberella u! u. U. U@ U# u ' 

print re.sub (' [u|U][s,.,?,!,W,#,@ (^a-zA-Z)]', ' you ', text)

The output I get is:

how are you you berella you you you you you you

As you can see the problem is that 'umbrella' is changed to 'berella'. Also, I want to keep the character that appears after a 'u'. For example, I want 'u!' to be changed to 'you!'. Can anyone tell me what I am doing wrong and what is the best way to write the regular expression?

1 Answer

0 votes
by (106k points)

You can use a special character \b to substitute all ‘u’ or ‘U’s with ‘you’, which matches empty string at the beginning or at the end of a word:

print re.sub(r'\b[uU]\b', 'you', text)

spaces are not a reliable solution because there are also plenty of other punctuation marks, so an abstract character \b was invented to indicate a word's beginning or end.

Related questions

0 votes
1 answer
0 votes
1 answer
asked Oct 3, 2019 in Python by Sammy (47.6k points)
0 votes
1 answer
+2 votes
1 answer

Browse Categories