Note_Tech

All technological notes.


Project maintained by simonangel-fong Hosted on GitHub Pages — Theme by mattgraham

Python - RegEx Library

Back


RegEx


RegEx Functions

Function Description
findall Returns a list containing all matches
search Returns a Match object if there is a match anywhere in the string
split Returns a list where the string has been split at each match
sub Replaces one or many matches with a string

Raw String Notation: r"row_string" Prefix


Metacharacters

Character Description  
^ Caret - Starts with "^hello"
$ Dolla - Ends with "planet$"
. Dot - Any character (except newline character) "he..o"
[] Bracket - the set of characters "[a-m]"
? Question mark - Zero or one occurrences "he.?o"
+ Plus - One or more occurrences "he.+o"
* Asterisk - Zero or more occurrences "he.*o"
{} Curly Braces - Exactly the specified number of occurrences "he.{2}o"
\ Signals a special sequence (can also be used to escape special characters) "\d"
` ` Pipe - Either or "falls | stays"
() Capture and group  

Special Sequences

Character Matches  
\d any digit character == [0-9] "\d"
\D non-digit character == [^0-9] "\D"
\s any white space character "\s"
\S any character except the white space character "\S"
\w any alphanumeric character == [a-zA-Z0-9] "\w"
\W any characters except the alphanumeric character == [^a-zA-Z0-9] "\W"
\A the defined pattern at the start of the string "\AThe"
\b the pattern at the beginning or at the end of a word r"\bain" / r"ain\b"
\B the opposite of \b. r"\Bain"
r"ain\B"
\Z the pattern is at the end of the string "Spain\Z"

Sets

Set Match
[arn] any of the specified characters (a, r, or n)
[a-n] alphabetically between a and n
[^arn] any character EXCEPT a, r, and n
[0123] any of the specified digits (0, 1, 2, or 3) are present
[0-9] any digit between 0 and 9
[0-5][0-9] any two-digit numbers from 00 and 59
[a-zA-Z] any character alphabetically between a and z, lower case OR upper case
[+] any + character in the string

Flags

Set Match
re.A/ re.ASCII ASCII-only matching
re.I/ re.IGNORECASE case-insensitive matching
re.M/ re.MULTILINE each newline matching

Match Object


import re
print("\n--------Match Object.span()--------\n")


txt = "The rain in Spain"
xMatch = re.search(r"\bS\w+", txt)
print("span():\t\t", xMatch.span())         # span():          (12, 17)
print("start():\t", xMatch.start())         # start():         12
print("end():\t\t", xMatch.end())           # end():           17
print("group():\t", xMatch.group())         # group():         Spain

print("re:\t\t", xMatch.re)                 # re:              re.compile('\\bS\\w+')
print("string:\t\t", xMatch.string)         # string:          The rain in Spain


Top