I am doing some text normalization using python and regular expressions. I would like to substitute all 'u'or 'U's with 'you'. Here is what I have done so far:
import re
text = 'how are u? umberella u! u. U. U@ U# u '
print re.sub (' [u|U][s,.,?,!,W,#,@ (^a-zA-Z)]', ' you ', text)
The output I get is:
how are you you berella you you you you you you
As you can see the problem is that 'umbrella' is changed to 'berella'. Also, I want to keep the character that appears after a 'u'. For example, I want 'u!' to be changed to 'you!'. Can anyone tell me what I am doing wrong and what is the best way to write the regular expression?