Extract Regex Substring from Pandas column that meets criteria

Question

1 Answer

supriya · Answer 1 · 2020-10-15T08:50:06+0000

Use the Series.str.extract with match by regex for first 4 digits and then for the 11 or 14 digits or letters:

df['new'] = df['Messy_IDS'].str.extract('([0-9]{4}[0-9A-Za-z]{11,14})')
Or:
df['new'] = df['Messy_IDS'].str.extract('(\d{4}\w{11,14})')
print (df)
Messy_IDS Desired_Output \
0 Looking for ID : 7010M000002N8c5T7A 7010M000002N8c5T7A
1 5634M000002N8c5T7A,7010M000002N8c5T7A 5634M000002N8c5T7A
2 https://website.com/12340000000f5F5 12340000000f5F5
new
0 7010M000002N8c5T7A
1 5634M000002N8c5T7A
2 12340000000f5F5

Improve your knowledge in data science from scratch using Data science online courses

Extract Regex Substring from Pandas column that meets criteria

1 Answer

Related questions

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources