Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (50.2k points)

my_string = "C2H6O" 

a = re.findall("((Cl|H|O|C|N)[0-9]*)", my_string) 

print(a)

The output is [("C2", "C"), ("H6", "H"), ("O", "O")], but I expected ["C2", "H6", "O"].

I understand how the tuple works, but I feel like this code should not have the second element in the tuple ("C2", "C").

1 Answer

0 votes
by (106k points)

I will suggest some ways to solve this issue:-

For your desired output you will have to remove the extra bracket that you are using:-

import re

my_string = "C2H6O"

a =re.findall("([Cl|H|O|C|N][0-9]*)", my_string)

print(a)

image

  • You are getting that output because your pattern contains capturing groups.

  • So a Capturing group(regular expression) means that a part of a pattern can be enclosed in parentheses (...).

  • If you want to get rid of them, use this pattern:- r"(?:Cl|H|O|C|N)[0-9]*

import re

my_string = "C2H6O"

a=re.findall(r"(?:Cl|H|O|C|N)[0-9]*", my_string)

print(a)

image

  • So what it does is it removes the (unneeded) outside capture group completely and uses a non-capturing group for the alpha characters.

Browse Categories

...