Difference between re.search() and re.match() in Python

Difference between re.search() and re.match() in Python

The re.match() and re.search() are functions provided by Python’s built-in re module, which fully supports working with regular expressions. Regular expressions are patterns used to search, match, and manipulate strings in Python. The Re Module provides a faster and more efficient way to search a string. In this article, you will explore re.match() and re.search() functions in the re module with the help of example codes.

Table of Contents:

Understanding re.match() in Python

re.match() is used to find a match in the given input string based on a pattern that the user wants to find. This pattern is generally defined using regular expressions in Python. This function only returns a match if they are found at the start. If the matching string lies somewhere in the middle, it will return None.

Syntax:

result = re.match(r'regular expression', 'input string')

Example:

Python

Output:

re.match() Function Output

Explanation: In this example, we tried finding matches for three strings. Let us discuss each match.

  • In the first match, since ‘Intellipaat’ was at the beginning of the string, the match was found.
  • In the second example, ‘is’ was not at the beginning of the string. Hence, the function returned None.
  • In the third match, we used a regular expression (Iw+) to match any word starting with ‘I’. This successfully matches “Intellipaat” at the start.

If you are not familiar with regex patterns like ‘w+’, check out Regular Expressions in Python blog to learn more.

Note: From the example above, it is clear that ‘beginning of the string’ means that the pattern must start at index 0, not just at the beginning of a word or a line. Whitespace, punctuation, or any character before the pattern will cause re.match() to return None.

Since re.match() only looks for a match at the very beginning of the string, it can be limiting in many real-world scenarios. To address this limitation, Python provides another function called re.search().

Understanding re.search() in Python

The re.search() scans the whole string to find the first location where the string matches the regular expression pattern. This function will find a match anywhere in the sentence, not just the beginning.

Syntax:

result = re.search(r'regular expression', 'input string')

Example:

Python

Output:

re.search() Function Output

Explanation:

  • In the first pattern, re.search() finds “Intellipaat” just like re.match() because the word is at the beginning of the string. When the match is at the start, both re.search() and re.match() give the same result.
  • In the second pattern, it was not at the beginning of the sentence. But re.search() fetched a match unlike re.match().
  • re.search() successfully gave a match for the “Bw+” pattern, which was Bangalore and was found towards the end.
Master Python and Future-Proof Your Career
Real-world skills, job-ready training and certification included.
quiz-icon
Feature re.match() re.search()
Match Location Only checks for a match at the beginning of the string Scans the entire string for the first match
Return Type Returns a match object if found at index 0, else None Returns a match object if found anywhere, else None
Use Case Useful when you expect the match at the start itself Useful when the match can be anywhere in the string
Performance Slightly faster in very large strings (because it checks only the beginning) Slightly slower since it checks the entire string
Matching on Multiline Strings Only checks the first character of the entire string Can match across lines if the pattern and flags allow

Limitations of re.search() and re.match()

When finding matching patterns in a string, there may be cases where multiple substrings match the given pattern. In many scenarios, you may want to capture all occurrences rather than just the first one. For example, counting email addresses, phone numbers, or repeated keywords, all of these follow an identifiable pattern to define them.

However, both re.match() and re.search() have their limitations:

  • re.match(): Only searches the beginning of the string. If the string is not found in the beginning, it stops searching altogether. So, finding multiple occurrences is difficult with this method.
  • re.search(): This is better than re.match() since it scans the whole string, but still it returns the first match and not all the matches. To find all the occurrences, you have to implement a loop where the string is split after every match. This will be computationally inefficient and repetitive.

In addition to this, the common limitation of both functions is that they can only return the first match that they find. If the input string has more than one occurrence of the match, the re.match() and re.search() functions will not work. To overcome this limitation, Python offers another function called re.findall().

Overcoming the Limitations of re.match() and re.search() in Python

The re.findall() returns a list of all non-overlapping matches of a pattern in a string. If no matches are found, it returns an empty list.

Syntax

result = re.findall(r'regular expression', 'input string')

It uses no loops or extra split functions to find the matches. It finds all the matches in one search.

Example:

Python

Output:

re.findall() function output

Explanation: In this example, the re.findall() function fetched all the words with the same pattern as suggested by the regex patterns “b[A-Z][a-zA-Z]*b“, “Intellipaat” and “[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+“. 

Regular Expressions have useful flags that can be passed as arguments to slightly modify the behaviour of the function. They are especially useful when you want to perform case-insensitive searches, work with multiline strings, and in other scenarios. Using flags in your code enhances the flexibility and readability of your pattern-matching algorithm.

Some commonly used flags are mentioned below.

Flag Description
re.IGNORECASE or re.I Makes the pattern matching case-insensitive
re.MULTILINE or re.M Treats each line in the string as the beginning (^) and end ($) of a string
re.DOTALL or re.S By default, dot (.) matches any character except newline. With re.DOTALL, it also matches newlines.

Let us look at an example that implements all these flags with re.match() and re.search() to improve the performance.

Example:

Python

Output:

regex flag implementation

Explanation:

  • In the first example, we wanted to find the word “great” but due to case sensitivity re.search() returned None. After applying the IGNORECASE flag, it gave the desired result.
  • In the second example, finding the ‘^data’ string fails without the MULTILINE flag since “data” is at the beginning of the second line. The MULTILINE flag matches successfully, as it tells the interpreter to consider each line individually.
  • In the third example, we are trying to match all the text between the words “Intellipaat” and “Thank,” even though they are on different lines. By default, the dot (.) does not match newline characters, so the match fails. When we use the DOTALL flag, it allows the dot to include newlines too, and the pattern matches successfully across multiple lines.

Get 100% Hike!

Master Most in Demand Skills Now!

Performances and Best Practices

  • The re.match() only searches the start of the sentence. This makes it faster and more efficient than re.search(). But remember to use re.match() only when needed, since it only searches the start of the string.
  • In Python, backslashes () are used for escape sequences, like n for newline, t for tab, etc. But in regular expressions, backslashes are also very common, for example, d for digits, w for characters, etc. If you don’t use a raw string, Python will interpret the backslash before it even gets to the regex engine, which can cause unexpected behavior or even errors.
  • When you are running the same regex pattern multiple times, like inside a loop, it would be better to compile the pattern at once using the re.compile(). This improves the performance because Python doesn’t have to re-parse the pattern every time.

Example:

Python

Output:

performances and best practices output

Explanation: The code checks each log entry for a date in the YYYY-MM-DD format and prints the lines where a match is found — the first two lines. The third line is skipped as it has no date. Using re.compile() improves performance by compiling the pattern once, avoiding repeated processing of r’d{4}-d{2}-d{2}’ in the loop.

  • Avoid using overly complex regex patterns, as they will be harder to read and debug by the developers. When regex patterns become long, they become more prone to errors.

Common Mistakes While using re.match() and re.search() in Python

Here are some common mistakes developers make while writing code using the regex module.

  • Not using raw strings (r””) for regex patterns: It is important to use raw strings when using regex patterns, or the interpreter will interpret backslashes and treat them as escape characters instead of regex patterns. Raw string preserves the backslashes.
  • Confusing re.match() and re.search(): Beginners often confuse the two functions. Remember that re.match() scans only the beginning of the string, whereas the re.search() function scans the whole string.
  • Forgetting to compile regular expressions in loops or repetitive tasks: When you call re.match() or re.search() multiple times with the same pattern, Python recompiles the regex every time, which can hurt performance. Using re.compile() once and reusing the compiled object is much faster and more efficient.

Real-World Use Cases in Python

Pattern matching has many uses in real-world situations. Let us explore them one by one.

Case 1: Validating Form Inputs

We can use a regex to validate some common, important fields like Phone Numbers and emails that must not be wrong. In phone numbers, since the country code appears at the very start, we can validate them using the re.match() function.

Example:

Python

Output:

Real world example case 1

Explanation: In this example, re.match() is used to check for the pattern at the start of the string. It is efficient and faster.

Case 2: Searching Logs or Text Files

When you want to analyze logs to find error messages or timestamps, re.search() is the best function to use, as the match might appear anywhere in the line, and we would want to search the whole string. To make our pattern-finding logic more flexible to generic text and human-made mistakes, we can add an input keyword with the IGNORECASE flag.

Example:

Python

Output:

Real world example case 2

Explanation: We used the keyword ‘disk space’ to find a keyword match in the logfile using the re.search() function.

Free Python Course for Beginners
Code at your own pace and build real projects
quiz-icon

Conclusion

Both the re.match() and re.search() functions are useful for pattern matching in Python. While re.match() checks for a match only at the beginning of a string, re.search() looks for a match anywhere in the string. We also explored re.findall(), which helps find all matches in a string. In addition, we saw how different flags can make these functions more powerful and flexible. These tools are especially helpful when working with text data or building input validation features.

To take your skills to the next level, check out this Python training course and gain hands-on experience. Also, prepare for job interviews with Python interview questions prepared by industry experts.

Difference Between re.match() and re.search() – FAQs

Q1. When should I use re.match() instead of re.search()?

You can use re.match() when you expect the pattern to appear right at the beginning of the string. It is faster as it doesn’t search the whole string.

Q2. What is the difference between match and search in regex?

The match function looks for matches at the start of the string, whereas the search function scans the whole string for a match.

Q3. What does re.compile() do, and why should I use it?

The re.compile() compiles a regex pattern into a regex object in advance, so it is more efficient when reused many times, particularly within loops.

Q4. Why is using raw strings (e.g., r"d+") important in regex?

Raw strings make sure the backslash works correctly in regex. Without it, Python might change things like n into a new line instead of what the regex needs.

Q5. Is re.match() faster than re.search()?

Yes, it is slightly faster than re.search(). This is because re.match() only checks the beginning of the string.

About the Author

Senior Consultant Analytics & Data Science, Eli Lilly and Company

Sahil Mattoo, a Senior Software Engineer at Eli Lilly and Company, is an accomplished professional with 14 years of experience in languages such as Java, Python, and JavaScript. Sahil has a strong foundation in system architecture, database management, and API integration. 

Full Stack Developer Course Banner