Python RegEx - Regular Expressions

A RegEx is a sequence of characters that form a pattern of searches.

RegEx can be used to verify whether a string has a search pattern that is specified.

RegEx Module

Python provides an integrated re module that can be used for working with regular expressions.

Import the re module :

import re
RegEx in Python

You can start utilising regular expressions when you've imported the re module :

Example :- Check if the string starts with "The" and ends with "Spain":

import re
txt = "I am what i am"
x ="^I.*am$", txt)
if x:
  print("YES! We have a match!")
  print("No match")

Output :-

YES! We have a match!

RegEx Functions

The re module provides a series of functions to search for a match string :

Function Description
findall() Returns a list containing all matches
search() Returns a Match object if there is a match anywhere in the string
split() Returns a list where the string has been split at each match
sub() Replaces one or many matches with a string
Characteristics with a specific meaning are metacharacters :

Character Description Example
[] A set of characters "[a-m]"
\ Signals a special sequence (can also be used to escape special characters) "\d"
. Any character (except newline character) "he..o"
^ Starts with "^hello"
$ Ends with "world$"
* Zero or more occurrences "aix*"
+ One or more occurrences "aix+"
{} Exactly the specified number of occurrences "al{2}"
| Either or "falls|stays"
() Capture and group
Special Sequences

A specific sequence is a \ and a particular meaning follows one of the characters in the list below :

Character Description Example
\A If the provided characters are at the start of the string, it returns a match. "\AThe"
\b Returns a match when the provided characters appear at the start or end of a word (The "r" at the start of the string ensures that it is viewed as a "raw string.") r"\bain"
\B Returns a match when the required characters appear but not at the start (or end) of a word. (The "r" at the start of the string ensures that it is viewed as a "raw string.") r"\Bain" r"ain\B"
\d If the string contains digits, a match is returned (numbers from 0-9) "\d"
\D If the string DOES NOT contain any digits, it returns a match. "\D"
\s When the string contains a white space character, it returns a match. "\s"
\S If the string doesn't really contain a white space character, this function returns a match. <"\S"
\w Returned a match if the string contains some word characters (letters A through Z, numbers 0 through 9, and the underscore character). "\w"
\W If the string doesn't really include any word characters, it returns a match. "\W"
\Z If the provided characters are at the end of the string, it returns a match. "Spain\Z"

A set is a group of characters with unique significance inside a pair of [] square brackets.

Set Description
[arn] Returns a match where one of the specified characters (a, r, or n) are present
[a-n] Returns a match for any lower case character, alphabetically between a and n
[^arn] Returns a match for any character EXCEPT a, r, and n
[0123] Returns a match where any of the specified digits (0, 1, 2, or 3) are present
[0-9] Returns a match for any digit between 0 and 9
[0-5][0-9] Returns a match for any two-digit numbers from 00 and 59
[a-zA-Z] Returns a match for any character alphabetically between a and z, lower case OR upper case
[+] In sets, +, *, ., |, (), $, {} has no special meaning, so [+] means: return a match for any + character in the string
The findall() Function

A list of every match is returned with the findall() function.

Example 1 :- Print a list of all matches and Return a list containing every occurrence of "ai":

import re
txt = "I am what i am"
x = re.findall("am", txt)

Output :-

['am', 'am']

The list includes the matches in the order in which they were found.

An empty list shall be returned if no matches are found :

Example 2 :- If no match was found, return an empty list :

import re
txt = "I am what i am"
#Check if "india" is in the string:
x = re.findall("india", txt)
if (x):
  print("Yes, there is at least one match!")
  print("No match")

Output :-

No match

The search() Function

The function search() looks for a match string and returns a match if a match exists.

Only the first occurrence of the match can be returned if more than one match happens :

Example :- Find the first character in the string for white space :

import re
txt = "I am what i am"
x ="\s", txt)
print("The first white-space character is located in position:", x.start()) 

Output :-

The first white-space character is located in position: 1

If there are no matches, the value None is returned :

Example 2 :- Make a search which doesn't match :

import re
txt = "I am what i am"
x ="am i", txt)

Output :-


The split() Function

The split() function returns a list with each match divided by the string :

Example 1 :- Split the string at every white-space character:

import re
txt = "I am what i am"
x = re.split("\s", txt)

Output :-

['I', 'am', 'what', 'i', 'am']

The number of split can be controlled by entering the option maxsplit :

Example 2 :- Split the string at the first two white-space character :

import re
txt = "I am what i am"
x = re.split("\s", txt, 2)

Output :-

['I', 'am', 'what i am']

The sub() Function

The function sub() replaces matches for the text of your choice :

Example 1 :- Replace all white-space characters with "#" :

import re
txt = "I am what i am"
x = re.sub("\s", "#", txt)

Output :-


You can specify the count option to determine the number of replacements :

Example 2 :- Replace the first 2 occurrences of a white-space character with "#" :

import re
txt = "I am what i am"
x = re.sub("\s", "#", txt, 2)

Output :-

I#am#what i am

Match Object

Match Objects are an object with search and results information.

Note : The value of None is returned instead of the Match Object if there is no match.

Example 1 :- The search() function returns a Match object:

import re
txt = "I am what i am"
x ="am", txt)

Output :-

<re.Match object; span=(2, 4), match='am'>

The object Match provides properties and methods for obtaining search information and the result :

.span() returns a tuple that contains the match start and end positions.

.string returns the string which is passed by the function.

.group() returns the part where a match exists in the string.

Example 2 :- Print the first match position (start and end).

The regular expression searches for all words that begin with "W" in the upper case :

import re
txt = "I am What i am"
x ="\bS\w+", txt)

Output :-

(5, 9)

Example 3 :- Print the string passed into the function :

import re
txt = "I am What i am"
x ="\bW\w+", txt)

Output :-

I am What i am

Example 4 :- Print the string section where the match existed.

The regular expression searches for words starting with an upper case 'S' :

import re
txt = "I am What i am"
x ="\bW\w+", txt)

Output :-


Note : The None will be returned, instead of the Match Object, if no match exists.

