What is Regular Expression in Python?
Python Regular Expressions or Python RegEx are patterns that permit us to ‘match’ various string values in a variety of ways.
A pattern is simply one or more characters that represent a set of possible match characters. In regular expression matching, we use a character (or a set of characters) to represent the strings we want to match in the text.
In this Python RegEx tutorial, we will learn all the important aspects of Regular Expression or RegEx in Python, covering the following topics:
Let us get started then.
All About Regular Expression Characters in Python
Symbol |
Meaning |
. (period) |
Matches any character except the newline character in a given regular expression in Python |
^ (caret sign) |
Matches the start of any string of a given regular expression in Python |
$ (dollar sign) |
Matches the end of any string of a given regular expression in Python |
* (asterisk) |
Matches zero or more repetitions of a given regular expression in Python |
? |
Matches zero or one of the previous regular expressions in Python |
{} |
Used as either {m}, where m means to match exactly ‘m’ instances of the previous regular expression, or as {m,n} where n > m, meaning to match between ‘m’ and ‘n’ instances of the previous regular expression in Python |
(backslash) |
Either a special character, such as one of the other regular expression characters (i.e., * matches an asterisk) or one of the special regular expression sequences |
How to Use the Match Function of RegEx in Python?
The match function matches the Python RegEx word to the string with optional flags.
Syntax:
re.match(pattern, string, flags=0)
Where ‘pattern’ is a regular expression to be matched, and the second parameter is a Python String that will be searched to match the pattern at the starting of the string.
Example of Python regex match:
import re
print re.match(“i”, “intellipaat”)
Output:
<-sre.SRE-Match object at 0x7f9cac95d78>
Python then outputs a line signifying that a new object, i.e., sre.SRE type has been created. The hex number following is the address at which it was created.
import reprint re.match(“b”, “intellipaat”)
Output:
None
Special Sequence Characters of RegEx in Python
The six most important sequence characters are:
- d: Matches any decimal digit. This is really the same as writing [0-9] but is done so often that it has its own shortcut sequence.
- D: Matches any non-decimal digit. This is the set of all characters that are not in [0-9] and can be written as [^0-9].
- s: Matches any white space character. White space is normally defined as a space, carriage return, tab, and non-printable character. Basically, white space is what separates words in a given sentence.
- S: Matches any non-whitespace character. This is simply the inverse of the s sequence mentioned above.
- w: Matches any alphanumeric character. This is the set of all letters and numbers in both lower and uppercase.
- W: Matches any non-alphanumeric character. This is the inverse of the w sequence mentioned above.
Get 100% Hike!
Master Most in Demand Skills Now!
Search Function of RegEx in Python
It searches for the primary occurrence of a Regular Expression pattern within a string with optional flags.
Syntax:
re.search(pattern, string, flags=0)
Example of Python regex search:
m = re.search(‘bopenb’, ‘please open the door’)
print m
Output:
None
This output so occurred because the ‘b’ escape sequence is treated as a special backspace character. Metacharacters are those characters that include ‘/’.
import re
m = re.search(‘\bopen\b’, “please open the door”)
print m
Output:
<-sre.SRE-Match object at 0x00A3F058>
RegEx Replace Function in Python
The idea is to use the very normal form of the re.sub() method with only the first 3 arguments.
import re
def substitutor():
sen1 = "It is sunny outside."
print(re.sub(r"sunny", "raining", sen1))
sen2 = "Intellipaat Python Course"
print(re.sub(r"Course", "Tutorial", sen2))
substitutor()
The output will be
It is raining outside.
Intellipaat Python Tutorial
Python Regular Expression Modifiers (Option Flags)
The following table contains the list of all Python Regular Expression or Python RegEx modifiers, along with their descriptions.
Modifier |
Description |
re.I |
Performs case-insensitive matching |
re.L |
Interprets words according to the current locale. This interpretation affects the alphabetic group (w and W), as well as the word boundary behavior (b and B) |
re.M |
Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string) |
re.S |
Makes a period (dot) match any character, including a new line |
re.U |
Interprets letters according to the Unicode character set. This flag affects the behavior of w, W, b, and B |
re.X |
Allows ‘cuter’ regular expression syntax |
This brings us to the end of this module about regular expression in python Tutorial. Here, we learned what Python 3 RegEx is, Regular Expression Characters in Python, Match Function of RegEx in Python, Special Sequence Characters of RegEx in Python, Search Function of RegEx in Python, also talked about Python Regex Modifiers. Now, if you want to know why Python is the most preferred language for data science, you can go through this blog on Python Data Science tutorial.