Python RegEx - Regular Expressions
A RegEx is a sequence of characters that form a pattern of searches.
RegEx can be used to verify whether a string has a search pattern that is specified.
RegEx Module
Python provides an integrated re
module that can be used for working with regular expressions.
Import the re
module :
import re
RegEx in Python
You can start utilising regular expressions when you've imported the re
module :
Example :- Check if the string starts with "The" and ends with "Spain":
import re
txt = "I am what i am"
x = re.search("^I.*am$", txt)
if x:
print("YES! We have a match!")
else:
print("No match")
Output :-
RegEx Functions
The re
module provides a series of functions to search for a match string :
Function | Description |
---|---|
findall() |
Returns a list containing all matches |
search() |
Returns a Match object if there is a match anywhere in the string |
split() |
Returns a list where the string has been split at each match |
sub() |
Replaces one or many matches with a string |
Metacharacters
Characteristics with a specific meaning are metacharacters :
Character | Description | Example |
---|---|---|
[] | A set of characters | "[a-m]" |
\ | Signals a special sequence (can also be used to escape special characters) | "\d" |
. | Any character (except newline character) | "he..o" |
^ | Starts with | "^hello" |
$ | Ends with | "world$" |
* | Zero or more occurrences | "aix*" |
+ | One or more occurrences | "aix+" |
{} | Exactly the specified number of occurrences | "al{2}" |
| | Either or | "falls|stays" |
() | Capture and group |
Special Sequences
A specific sequence is a \ and a particular meaning follows one of the characters in the list below :
Character | Description | Example |
---|---|---|
\A | If the provided characters are at the start of the string, it returns a match. | "\AThe" |
\b | Returns a match when the provided characters appear at the start or end of a word (The "r" at the start of the string ensures that it is viewed as a "raw string.") | r"\bain" r"ain\b" |
\B | Returns a match when the required characters appear but not at the start (or end) of a word. (The "r" at the start of the string ensures that it is viewed as a "raw string.") | r"\Bain" r"ain\B" |
\d | If the string contains digits, a match is returned (numbers from 0-9) | "\d" |
\D | If the string DOES NOTÂ contain any digits, it returns a match. | "\D" |
\s | When the string contains a white space character, it returns a match. | "\s" |
\S | If the string doesn't really contain a white space character, this function returns a match. | <"\S" |
\w | Returned a match if the string contains some word characters (letters A through Z, numbers 0 through 9, and the underscore character). | "\w" |
\W | If the string doesn't really include any word characters, it returns a match. | "\W" |
\Z | If the provided characters are at the end of the string, it returns a match. | "Spain\Z" |
Related Links
Sets
A set is a group of characters with unique significance inside a pair of []
square brackets.
Set | Description |
---|---|
[arn] | Returns a match where one of the specified characters (a , r , or n ) are present |
[a-n] | Returns a match for any lower case character, alphabetically between a and n |
[^arn] | Returns a match for any character EXCEPT a , r , and n |
[0123] | Returns a match where any of the specified digits (0 , 1 , 2 , or 3 ) are present |
[0-9] | Returns a match for any digit between 0 and 9 |
[0-5][0-9] | Returns a match for any two-digit numbers from 00 and 59 |
[a-zA-Z] | Returns a match for any character alphabetically between a and z , lower case OR upper case |
[+] | In sets, + , * , . , | , () , $ , {} has no special meaning, so [+] means: return a match for any + character in the string |
The findall() Function
A list of every match is returned with the findall()
function.
Example 1 :- Print a list of all matches and Return a list containing every occurrence of "ai":
import re
txt = "I am what i am"
x = re.findall("am", txt)
print(x)
Output :-
The list includes the matches in the order in which they were found.
An empty list shall be returned if no matches are found :
Example 2 :- If no match was found, return an empty list :
import re
txt = "I am what i am"
#Check if "india" is in the string:
x = re.findall("india", txt)
print(x)
if (x):
print("Yes, there is at least one match!")
else:
print("No match")
Output :-
No match
The search() Function
The function search()
looks for a match string and returns a match if a match exists.
Only the first occurrence of the match can be returned if more than one match happens :
Example :- Find the first character in the string for white space :
import re
txt = "I am what i am"
x = re.search("\s", txt)
print("The first white-space character is located in position:", x.start())
Output :-
If there are no matches, the value None
is returned :
Example 2 :- Make a search which doesn't match :
import re
txt = "I am what i am"
x = re.search("am i", txt)
print(x)
Output :-
Related Links
The split() Function
The split()
function returns a list with each match divided by the string :
Example 1 :- Split the string at every white-space character:
import re
txt = "I am what i am"
x = re.split("\s", txt)
print(x)
Output :-
The number of split can be controlled by entering the option maxsplit
:
Example 2 :- Split the string at the first two white-space character :
import re
txt = "I am what i am"
x = re.split("\s", txt, 2)
print(x)
Output :-
The sub() Function
The function sub()
replaces matches for the text of your choice :
Example 1 :- Replace all white-space characters with "#" :
import re
txt = "I am what i am"
x = re.sub("\s", "#", txt)
print(x)
Output :-
You can specify the count
option to determine the number of replacements :
Example 2 :- Replace the first 2 occurrences of a white-space character with "#" :
import re
txt = "I am what i am"
x = re.sub("\s", "#", txt, 2)
print(x)
Output :-
Match Object
Match Objects are an object with search and results information.
Note : The value of None
is returned instead of the Match Object if there is no match.
Example 1 :- The search()
function returns a Match
object:
import re
txt = "I am what i am"
x = re.search("am", txt)
print(x)
Output :-
The object Match provides properties and methods for obtaining search information and the result :
.span()
returns a tuple that contains the match start and end positions.
.string
returns the string which is passed by the function.
.group()
returns the part where a match exists in the string.
Example 2 :- Print the first match position (start and end).
The regular expression searches for all words that begin with "W" in the upper case :
import re
txt = "I am What i am"
x = re.search(r"\bS\w+", txt)
print(x.span())
Output :-
Example 3 :- Print the string passed into the function :
import re
txt = "I am What i am"
x = re.search(r"\bW\w+", txt)
print(x.string)
Output :-
Example 4 :- Print the string section where the match existed.
The regular expression searches for words starting with an upper case 'S' :
import re
txt = "I am What i am"
x = re.search(r"\bW\w+", txt)
print(x.group())
Output :-
Note : The None
will be returned, instead of the Match Object, if no match exists.