Regular expressions are patterns that permit you to “match” various string values in a variety of ways. The module re provides regular expressions in Python.
A pattern is simply one or more characters that represent a set of possible match characters. In regular expression matching, you use a character (or set of characters) to represent the strings you want to match in the text.
Table – Regular Expression Characters In Python

Symbol   Meaning
. (period)Matches any character except the newline character.
^ (caret sign)Matches the start of any string.
$ (dollar sign)Matches the end of any string.
* (asterisk)Matches zero or more repetitions of a given regular expression.
?Matches zero or one of the previous regular expressions.
{}Used as either {m} where m means to match exactly “m” instances of the previousregular expression or {m,n} where n > m, meaning to match between m and n instances of the previous regular expression.
\ (backslash)Either a special character, such as one of the other regular expression characters(i.e., \* matches an asterisk) or one of the special regular expression sequences

The match Function
It matches RE pattern to string with optional flags.

re.match(pattern, string, flags=0)

Where pattern is a regular expression to be matched, 2nd parameter is a string that will be searched to match pattern at the starting of the string.

import re
print re.match("i", "intellipaat")

<_sre.SRE_Match object at 0x7f9cac95d78>
Python then outputs a line signifying that a new object i.e. sre.SRE type has been created. The hex number following it is the address at which it was created.

import re
print re.match("b", "intellipaat")

Special Sequence Characters
 The six most important sequence characters are:

  • \d: Matches any decimal digit. This is really the same as writing [0-9], but is done so often that it has its own shortcut sequence.
  • \D: Matches any non-decimal digit. This is the set of all characters that are not in [0-9] and can be written as [^0-9]
  • \s: Matches any white space character. White space is normally defined as a space, carriage return, tab, and non-printable character. Basically, white space is what separates words in a given sentence.
  • \S: Matches any non white space character. This is simply the inverse of the \s sequence above.
  • \w: Matches any alphanumeric character. This is the set of all letters and numbers in both lower- and uppercase.
  • \W: Matches any non-alphanumeric character. This is the inverse of the \w sequence above.

Search Function
It searches for primary occurrence of RE pattern within string with optional flags.
Syntax, string, flags=0)


m ='\bopen\b', 'please open the door')
print m

This ouput is occurred because the ‘\b’ escape sequence is treated as a special backspace character. Meta characters are those characters which include /.

>>> import re
>>> m ='\\bopen\\b', "please open the door")
>>> print m

<_sre.SRE_Match object at 0x00A3F058>
Regular Expression Modifiers (Option Flags)

re.IPerforms case-insensitive matching.
re.LInterprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior (\b and \B).
re.MMakes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string).
re.SMakes a period (dot) match any character, including a newline.
re.UInterprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B.
re.XAllows “cuter” regular expression syntax.

Leave a Reply

Your email address will not be published. Required fields are marked *

Solve : *
12 + 29 =