SQL Regular Expression (RegEx)
SQL Regular Expression Case Sensitive
Example 1: We used the match operator "~" to match the regular expression "sqlguide" with the Email column, and the results show all rows with the pattern "sqlguide."
Match Regular Expression (Case Insensitive)
SELECT *
FROM Email
WHERE Email ~* 'SQLGUIDE';
Match Regular Expression (Case Sensitive)
SELECT *
FROM Email
WHERE Email ~ 'sqlguide';
Example 2: pattern matching is case sensitive:
regex=# SELECT 'similar' ~ 'Similar' as result;
Result
(1 row)
Example 3: If we run both queries in our first set to find all the results in our table that only had one alphabetic character in any range (A through Z) and that this character was only lower case, note how our results do not differ. Because the database I set up is not case sensitive, even if we just want to return messages with caps, we'll see lower case letters as well, and vice versa.
SELECT *
FROM alphareg
WHERE Alphabetic LIKE '[A-Z]'
SELECT *
FROM alphareg
WHERE Alphabetic LIKE '[a-z]'
SQL Regular Expression for Date
A regular expression for dates (YYYY-MM-DD) should look for four digits at the start, then a hyphen, a two-digit month between 01 and 12, then another hyphen, and lastly a two-digit day between 01 and 31.
Date validation must not be done with regular expressions. This will match date-like strings, which can subsequently be verified if necessary with a proper date library.
Still i am providing you the ValidationExpression.
For dd/mm/yyyy - ValidationExpression="(0[1-9]|[12][0-9]|3[01])[/](0[1-9]|1[012])[/]\d{4}"
For dd-mm-yyyy - ValidationExpression="(0[1-9]|[12][0-9]|3[01])[-](0[1-9]|1[012])[-]\d{4}"
(0[1-9]|[12][0-9]|3[01]) For Day
(0[1-9]|1[012]) For Month
\d{4} For Year
[-] For -
[/] For /
Example 1: Start with YYYYXMMXDD, where X is a separator of any sort, such as a space, dash, letter, or anything other than a number. We're also omitting dates that don't exist anymore or didn't exist before computers (for example, 999 A.D. isn't a legitimate date because computers didn't exist back then).
SELECT *
FROM tbBizExamples
WHERE BizVarchar LIKE '[1-2][0-9][0-9][0-9][^0-9][0-1][0-9][^0-9][0-3][0-9]%'
Notice some of the patterns we restricted based on the tools we’ve learned so far:
The YYYY year format must begin with either a 1 or a 2. This is because, at this point in human history, computers have only existed between these times (1900s, 2000s).
MM's month format must begin with either a 0 or a 1. Because there are twelve months in a year, the lowest month is 01 and the highest is 12.
The DD day format must begin with a 0 to 3 range. Because the lowest and highest days are 01 and 31, respectively.
We remove numbers ([^0-9]) because they are not valid separators in this case.
Finally, we may or may not be looking for a certain time, so we permit any character (percent) at the end so that we can get numbers that include the precise time.
We would alter or add to our query depending on the format we consider valid, as we may be looking for numerous formats. It's also possible that we wish to look at dates from the year 900 A.D., in which case we'd make the necessary alterations to include a format like YYY-DD-MM or similar derivative.
Example 2: Here’s the regex code that does all this:
/^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])$/
This code works for most dates but does exclude invalid days of the month like 2021-02-29 (2021 is not a leap year) or 2021-04-31 (April has only 30 days). These checks must be performed using the tools in your specific programming language of choice.
SQL Regular Expression for Digits
Use the following regular expression(^[0-9]*$) in the where clause to get only digits with REGEXP.
LIKE in our WHERE clause allows any character that comes before or after the numerical value or range we're looking for. We can even specify an exact precision value, such as needing only two digits in our integer.
Note: \d is standard shorthand for [0-9].
It's worth noting that [0-9] signifies "match exactly one digit".
The * wildcard denotes a match of zero or more digits, so d* denotes a match of zero or more digits, which contains an empty output.
Example 1: We’ll return to our alphanumreg table and query each column looking for values that only have two digits:
SELECT *
FROM alphanumreg
WHERE NumData LIKE '[0-9][0-9]'
SELECT *
FROM alphanumreg
WHERE NumInt LIKE '[0-9][0-9]'
SELECT *
FROM alphanumreg
WHERE AlphabeticNum LIKE '[0-9][0-9]'
Example 2: If you want only those rows which have exactly 10 digits and all must be only digit, use the below regular expression.
SELECT *FROM yourTableName
WHERE yourColumnName REGEXP '^[0-9]{10}$';
Example 3:
SELECT dbo.Regex('^[0-9]*$','123')
or better yet:
SELECT dbo.Regex('^\d*$','123')
SQL Regular Expression for Email
Using REGEXP function user can validate the email in mysql.
Example 1:
SELECT * FROM `emails` WHERE `email`NOT REGEXP ‘[-a-z0-9~!$%^&*_=+}{\\\’?]+
(\\.[-a-z0-9~!$%^&*_=+}{\\\’?]+)*@([a-z0-9_][-a-z0-9_]*(\\.[-a-z0-9_]+)*\\.
(aero|arpa|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org|pro|travel|mobi|[a-z][a-z])|
([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}))(:[0-9]{1,5})?’;
SELECT email
FROM Employee
WHERE email LIKE ‘%_@__%.__%’
AND PATINDEX(‘%[^a-z,0-9,@,.,_]%’, REPLACE(email, ‘-‘, ‘a’)) = 0;
Example 2: you can make the hop to .Net and do the matching there. You can instantiate a VBScript.RegExp inside T-SQL using sp_OACreate and use that.
CREATE FUNCTION dbo.isValidEmailFormat
(
@Email varchar(100)
)
RETURNS bit
AS
BEGIN
DECLARE @pattern varchar(4000)
SET @pattern = '[a-zA-Z0-9_\-]+@([a-zA-Z0-9_\-]+\.)+(com|org|edu|nz|au)'
DECLARE @Result bit
DECLARE @objRegexExp INT
EXEC sp_OACreate 'VBScript.RegExp', @objRegexExp OUT
EXEC sp_OASetProperty @objRegexExp, 'Pattern', @pattern
EXEC sp_OASetProperty @objRegexExp, 'IgnoreCase', 1
EXEC sp_OASetProperty @objRegexExp, 'MultiLine', 0
EXEC sp_OASetProperty @objRegexExp, 'Global', false
EXEC sp_OASetProperty @objRegexExp, 'CultureInvariant', true
EXEC sp_OAMethod @objRegexExp, 'Test', @Result OUT, @Email
EXEC sp_OADestroy @objRegexExp
RETURN @Result
END
If you want to be a little less restrictive on what characters are permitted, take a look at Regex email verification error - using JavaScript.
Example 3: Select Only the valid email address:
SELECT EmailAddress AS ValidEmail
FROM Contacts
WHERE EmailAddress LIKE '%_@__%.__%'
AND PATINDEX('%[^a-z,0-9,@,.,_,\-]%', EmailAddress) = 0
GO
Example 4:
select * from t_superheros;
ID | NAME | PHONE | |
---|---|---|---|
1 | Superman | superman@kmail.com | (655) 981-4455 |
2 | Batman | batmankmail.com | 6a5-981-4455 |
3 | Spiderman | spiderman@com | 657-8154455 |
4 | Flash Gordon | flashgordon@com | (99) 981-4455 |
5 | Phantom | phantom.test@gmail.com | (999) 981-4455 |
Only Superman and Phantom had their email addresses formatted correctly, as you can see. The '@' and '.' are either missing or improperly placed in the others.
We need to write a regular expression to check that the email address is of the form:
text @ text . text
The character w stands for text. We add a + sign at the end to indicate that the text must be at least one character long: w+.
This matches alphanumeric and underscore (_) characters. We must also allow a match for dots (.) in the user name, ensuring that no two dots appear consecutively. The escape character "" is placed before "." to indicate that it must be recognized literally rather than as a metacharacter. When combined with the text after it, we get (\.\w+).
To specify zero or more matches, we follow (\.\w+) with *.
So our entire regex for the username is \w+(\.\w+)*+.
@ can be written into the regex as-is.
For the domain part i.e. the part after @, at least one “.” is mandatory sandwiched between two text pieces. And there could be longer extensions with more than one “.”, such as yahoo.co.uk. This can be matched with the regex \w+(\.\w+)+.
So the resulting regular expression for email address validation is ^\w+(\.\w+)*+@\w+(\.\w+)+$.
Putting it into a query:
select id,
name, email
from t_superheros
where regexp_like
(email
,'^\w+(\.\w+)*+@\w+(\.\w+)+$');
Output:
ID | NAME | |
---|---|---|
1 | Superman | superman@kmail.com |
5 | Phantom | phantom.test@gmail.com |
SQL Regular Expression for End of Line
Syntax:
The $ character matches "the end of the string" :
SELECT * FROM Table WHERE REGEXP_LIKE (COLUMN,'_[[:digit:]]$')
Example: find all the names ending with 'ok' −
SELECT name FROM person_tbl WHERE name REGEXP 'ok$';
SQL Regular Expression Escape Characters
The LIKE operator supports the ESCAPE clause to indicate the escape character. In the pattern string, escape characters are used to indicate that any wildcard character that appears after the escape character must be handled as a regular character.
Backslash is the default escape character (\).
One character can be specified as the escape character. This escape character can be used to indicate a wildcard character that should be treated as a regular character. This is accomplished by simply preceding the occurrence of the wildcard character with the escape character. When using LIKE to match expressions to a pattern, just one escape character can be given.
Just use a backslash before the character, as shown in the MySQL:
- \0 An ASCII NUL (0x00) character.
- \' A single quote ("'") character.
- \" A double quote (""") character.
- \b A backspace character.
- \n A newline (linefeed) character.
- \r A carriage return character.
- \t A tab character.
- \Z ASCII 26 (Control+Z). See note following the table.
- \\ A backslash ("\") character.
- \% A "%" character. See note following the table.
- \_ A "_" character. See note following the table.
Escaped Characters
- \A start of a string
- \b word boundary. The zero-length string between \w and \W or \W and \w.
- \B not at a word boundary
- \cX ASCII control character
- \d single digit [0-9]
- \D single character that is NOT a digit [^0-9]
- \E stop processing escaped characters
- \l match a single lowercase letter [a-z]
- \L single character that is not lowercase [^a-z]
- \Q ignore escaped characters until \E is found
- \s single whitespace character
- \S single character that is NOT white space
- \u single uppercase character [A-Z]
- \U single character that is not uppercase [^A-Z]
- \w word character [a-zA-Z0-9_]
- \W single character that is NOT a word character [^a-zA-Z0-9_]
- \x00-\xFF hexadecimal character
- \x{0000}-\x{FFFF} Unicode code point
- \Z end of a string before the line break
Example 1: Look for any records that contain the percentage character.
However, because the % character is a wildcard, we shall utilise the escape character /. The following is our question:
SELECT * FROM questions WHERE question LIKE "%/%%" ESCAPE "/";
Example 2:
SELECT * FROM emp WHERE ENAME LIKE 'J%$_%' ESCAPE '$';
This matches all records with names that start with letter ’J’ and have the ’_’ character in them.
SELECT * FROM emp WHERE ENAME LIKE 'JOE$_JOHN' ESCAPE '$';
This matches only records with name ’JOE_JOHN’.
SQL Regular Expression One or More Spaces
The space(), tab (\t), new line (\n), and carriage return (\r) (helpful in Windows systems) are the most frequent types of whitespace used with regular expressions, and these special characters satisfy each of their corresponding whitespaces. Furthermore, when interacting with raw input text, a whitespace special character will match any of the particular whitespaces listed above.
To match one of more whitespace characters (space, EOL, and tab all count as whitespace):
Set "Find what" to \s+
Warning: Using \s+ will match end of line and therefore join multiple lines together (separated by the "replace with" string)
To replace with a tab character:
Set "Replace with" to \t
Example 1:
[ ]{2, }
SPACE (2 or more)
You could also double-check that words appear before and after those spaces. (not new lines or other whitespace)
\w[ ]{2,}\w
same, but you can also select (capture) simply the areas for replacement activities.
\w([ ]{2,})\w
or look for anything other than word letters before and after spaces (except whitespace)
[^\s]([ ]{2,})[^\s]
Example 2: The content of each line is indented by some whitespace from the index of the line in the strings below (the number is a part of the text to match). Write a pattern to match each line with whitespace characters between the number and the text. Whitespace characters are treated the same as any other character, and special metacharacters like the star and plus can also be utilized.
\s{2,}
SQL Regular Expression for Phone Number
Example 1: regex phone number 10 digit:
^(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$
Example 2: To create a regular expression, you must use specific syntax—that is, special characters and construction rules. For example, the following is a simple regular expression that matches any 10-digit telephone number, in the pattern nnn-nnn-nnnn:
\d{3}-\d{3}-\d{4}
Output:
Test String |
---|
123-456-7890 |
333-333-4444 |
1234567890 |
123456789 |
123-456-7890 |
14157059247 |
SQL Regular Expression for Single Character
To match any single character, use the dot. character as a wildcard.
Match any specific character in a set
- Use square brackets [] to match any characters in a set.
- Use \w to match any single alphanumeric character: 0-9, a-z, A-Z, and _ (underscore).
- Use \d to match any single digit.
- Use \s to match any single whitespace character.
For Example: a.c matches aac, abc, azc, a2c, a$c etc.
Example 1 regex: a.c
abc // match
a c // match
azc // match
ac // no match
abbc // no match
Example 2 regex: a[bcd]c
abc // match
acc // match
adc // match
ac // no match
abbc // no match
Example 3: . matches any single character:
SELECT 'Maria' REGEXP 'Ma.ia';
Output:
'Maria' REGEXP 'Ma.ia' |
---|
1 |
SELECT 'Maria' REGEXP 'Ma..ia';
Output:
'Maria' REGEXP 'Ma.ia' |
---|
1 |
SQL Regular Expression Special Character
To create a pattern for use with the REGEXP operators, a MySQL regular expression can employ any of the following constructions and special characters.
The first issue appears to be the ^ and $ signs (Mike C summarised it quicker than I did why...)
But I see escaping problems too: all special characters that mean something in regexp should be escaped specially placed in the [], so [, ], ^, -
Conclusion detailed in the regex documentation:
A bracket expression is a sequence of characters enclosed in the brackets '[]'. In most cases, it matches any single character from the list (but see below).
If the list starts with '^', it matches any single character from the list (except as noted below).
When two characters in a list are separated by a "-," this is a shortcut for the entire range of characters in the collating sequence between those two (inclusive). For example, "[0-9]" in ASCII matches any decimal digit.
Two ranges sharing an endpoint, such as' a-c-e, 'is illegal (!) Ranges are largely dependent on the collating sequence, therefore portable programmes should avoid using them.
Make the first character (after a possible '^') a literal ']' in the list.
Put the first or final character, or the second endpoint of a range, if you wish to add a literal '-'.
To make a literal '-' the first endpoint of a range, encapsulate it with '[.' and '.]'. (see below).
All other special characters, including ", lose their unique value within a bracket expression, with the exception of these and several combinations employing '[' (see following paragraphs).
Some characters have special meanings within regexes these characters are:
Char | Description | Meaning |
---|---|---|
\ | Backslash | Used to escape a special character |
^ | Caret | Beginning of a string |
$ | Dollar sign | End of a string |
. | Period or dot | Match any character (including carriage return and newline characters). |
| | Vertical bar or pipe symbol | Matches previous OR next character/group |
? | Question mark | Match zero or one of the previous |
* | Asterisk or star | Match zero, one or more of the previous |
+ | Plus sign | Match one or more of the previous |
( ) | Opening and closing parenthesis | Group characters |
[ ] | Opening and closing square bracket | Matches a range of characters |
{ } | Opening and closing curly brace | Matches a specified number of occurrences of the previous |
The construct or special character is shown, followed by a description of each and what operations in performs within the pattern for the regular expression.
- a* : Match any sequence of zero or more a characters.
- a+ : Match any sequence of one or more a characters.
- a? : Match either zero or one a characters.
- de|abc : Match either of the character sequences, de or abc.
- (abc)* : Match zero or more instances of the character sequence abc.
- {1},{2,3} : Provides a more general way of writing regular expressions that match many occurences of the previous atom (or “piece”) of the pattern. i.e. a? can be written as a{0,1}.
- [a-dX],[^a-dX] : Matches any character that is (or is not, if ^ is used) either a, b, c, d, or X. A “-” character between two other characters forms a range that maches all characters from the first character to the second.
- [.characters.] : Within a bracket expression (using “[” and “]”), matches the sequence of characters of that collating element. i.e. the pattern [[.period.]] would match the ‘.’ (period) character.
- [=character_class=] : Within a bracket expression, represents an equivalence class. It matches all characters with the same collation value, including itself.
- [:character_class:] : Within a bracket expression, represents a character class that matches all characters belonging to that class. i.e. the pattern [[:alpha:]] would match against a string that is all aphabetic characters.
- [[:<:]],[[:>:]] : These markers stand for word boundaries, and as such they match the beginning and ending of words, respectively.
NOTE: MySQL interprets the “\” (backslash) character as an escape character. If you choose to use the “\” character as part of your pattern in a regular expression it will need to escaped with another backslash “\\”.
Example :
Finished\? matches “Finished?”
^http matches strings that begin with http
[^0-9] matches any character not 0-9
ing$ matches “exciting” but not “ingenious”
gr.y matches “gray“, “grey”
Red|Yellow matches “Red” or “Yellow”
colou?r matches colour and color
Ah? matches “Al” or “Ah”
Ah* matches “Ahhhhh” or “A”
Ah+ matches “Ah” or “Ahhh” but not “A”
[cbf]ar matches “car“, “bar“, or “far”
[a-zA-Z] matches ascii letters a-z (uppercase and lower case)
SQL Regular Expression in Like
REGEXP LIKE is identical to the LIKE condition, except that REGEXP LIKE matches regular expressions instead of the simple pattern matching that LIKE does. This condition analyzes strings using the input character set's characters.
If the string fits the regular expression, the REGEXP LIKE operation returns true. This function works similarly to the LIKE condition, but instead of using wildcard character matching, it uses regular expressions.
The method will return False if any of the following parameters are NULL:
Syntax:
REGEXP_LIKE( sourceString, patternString )
Example 1: Searches for text like them or this from the table mytab:
SELECT * FROM mytab WHERE text LIKE_REGEXPR ' them|this ';
Example 2: The following query finds all players whose name begins with Ste:
splice> SELECT DisplayName
FROM Players
WHERE REGEXP_LIKE(DisplayName, '^Ste.*');
Output:
DIAPLAYNAME |
---|
Steve Raster |
Steve Mossely |
Stephen Tuvesco |
Example 3: The following query returns the first and last names for those employees with a first name of Steven or Stephen (where first_name begins with Ste and ends with en and in between is either v or ph):
SELECT first_name, last_name
FROM employees
WHERE REGEXP_LIKE (first_name, '^Ste(v|ph)en$');
Output:
FIRST_NAME | LAST_NAME |
---|---|
Steven | King |
Steven | Markle |
Stephen | Stiles |
SQL Regular Expression in Replace
The REGEXP_REPLACE function is a REPLACE extension that lets you search a string for a regular expression pattern. By default, the function returns source char with replace string replacing every appearance of the regular expression pattern.
The original string in the SQL REGEXP_REPLACE() method represents a regular expression pattern. Regular expression pattern string replaced with original string. If no matches are found, the original string is returned.
It uses regular expression pattern comparing to replace a sequence of characters in a string with another set of characters.
Substitute another substring for all occurrences of a substring that match a regular expression. It's identical to the REPLACE function, only it selects the substring to be replaced using a regular expression.
Syntax:
REGEXP_REPLACE( string, target [, replacement [, position [, occurrence ... [, regexp_modifiers ] ] ] ] )
Example 1: Let's start by using the REGEXP_REPLACE function to replace the first word in a string.
SELECT REGEXP_REPLACE ('TechOnTheNet is a great resource', '^(\S*)', 'CheckYourMath')
FROM dual;
Result:
This example will return 'CheckYourMath is a great resource' because it will start the match at the beginning of the string as specified by ^ and then find the first word as specified by (\S*). The function will then replace this first word with 'CheckYourMath'.
Example 2: Find groups of "word characters" (letters, numbers and underscore) ending with "thy" in the string "healthy, wealthy, and wise" and replace them with nothing.
SELECT REGEXP_REPLACE('healthy, wealthy, and wise','\w+thy');
Output:
REGEXP_REPLACE |
---|
, , and wise |
Example 3: Consider following example is REGEXP_REPLACE function find 1 or more (.) character in original string replace to single blank space.
SELECT
REGEXP_REPLACE('10...taxo.....court,...near..tuff.......mountain','(\.){2,}', ' ') "DOT_REPLACE_TO_SINGLE_SPACE"
FROM DUAL;
Output:
DOT_REPLACE_TO_SINGLE_SPACE |
---|
10 taxo court, near tuff mountain |
Example 4: The following example examines country_name. Oracle puts a space after each non-null character in the string.
SELECT
REGEXP_REPLACE(country_name, '(.)', '\1 ') "REGEXP_REPLACE"
FROM countries;
Output:
REGEXP_REPLACE |
---|
A r g e n t i n a |
A u s t r a l i a |
B e l g i u m |
B r a z i l |
C a n a d a |
SQL Regular Expression in Substring
REGEXP SUBSTR() returns a substring from a string. This substring uses a regular expression pattern to search through the original string. It's similar to REGEXP INSTR, only it returns the substring itself rather than the position of the substring. If you need the information of a match string but not its position in the source string, this function comes in handy.
Inside a string, this function returns the substring that satisfies a regular expression. This function returns NULL if no matches are discovered. This is not the same as an empty string, which the function can return if the regular expression matches a string of zero length.
Syntax :
REGEXP_REPLACE(original_string, pattern [ , position [ , occurrence [ , match_param [ , sub_expression ] ] ] ] )
Example 1: Consider following example is REGEXP_SUBSTR function fetching 'taxo' substring from original string using 'ta[^ ]o' substring pattern.
SELECT REGEXP_SUBSTR('10 taxo court, near tuff mountain', 'ta[^ ]o') "FETCH_ONLY_taxo_WORD" FROM DUAL;
Output:
FETCH_ONLY_taxo_WORD |
---|
taxo |
Example 2: The following example searches the string for the first comma-delimited substring. Oracle Database looks for a comma followed by one or more non-comma characters and then another comma. The substring, as well as the leading and trailing commas, is returned by Oracle.
SELECT
REGEXP_SUBSTR('500 Oracle Parkway, Redwood Shores, CA',
',[^,]+,') "REGEXPR_SUBSTR"
FROM DUAL;
Output:
REGEXPR_SUBSTR |
---|
, Redwood Shores, |
Example 3: Select the first substring of letters that end with "thy."
SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy');
Output:
REGEXP_SUBSTR |
---|
healthy |
Example 4: Here is one example that uses POSIX regular expressions to extract any word that has 'ss' among its letters:
select substring('Learning SQL is essential.' from '\w*ss\w*');
Output:
substring |
---|
essential |
Example 5: substring() with SQL regular expressions involves three parameters: the string to search, the pattern to match, and a delimiter defined after the for keyword. In the following example we look for a three and then seven letter words that starts with an 'S' and ends with an 'L':
select substring('Learning SQL is essential.' from '%#"S_L#"%' for '#');
Output:
substring |
---|
SQL |
select substring('Do you pronounce it as SQL or SEQUEL?' from '%#"S____L#"%' for '#');
Output:
substring |
---|
SEQUEL |
SQL Regular Expression in Where
The REGEXP operator in MySQL allows you to match patterns.
Syntax:
The regular expression always using with where clause:
SELECT
column_list
FROM
table_name
WHERE
string_column REGEXP pattern;
Example 1: WHERE not regex in SQL
SELECT FirstName FROM intque.person_tbl WHERE FirstName NOT REGEXP '^[aeiou].*ok$';
Example 2: To find the product whose name contains exactly 10 characters, use ‘^' and ‘$ to match the beginning and end of the product name, and repeat {10} times of any character ‘.' in between as shown in the following query:
SELECT emp_name
FROM employees
WHERE
emp_name REGEXP '^.{10}$';
Example 3: Use REGEXP instead of LIKE
SELECT trecord FROM `tbl` WHERE (trecord REGEXP '^ALA[0-9]')