Regular Expressions in Python
Regular expressions, or regex, are sequences of characters that form search patterns. Python provides
the re module for working with regex, allowing tasks like searching, matching, and manipulating text.
This guide introduces the basics of regex in Python with detailed explanations and numerous coding
examples.
1. Basics of Regular Expressions
A regex defines a pattern to match strings. For example:
• Literal Characters: Match exact characters (e.g., cat matches "cat").
• Metacharacters: Special symbols for patterns (e.g., ., *, +).
2. The re Module
To use regex, first import the re module:
import re
Common Functions in the re Module:
1. re.match(): Matches a pattern at the start of a string.
2. re.search(): Searches for a pattern anywhere in the string.
3. re.findall(): Returns all occurrences of a pattern as a list.
4. re.sub(): Replaces occurrences of a pattern with a specified string.
5. re.compile(): Compiles a regex pattern into a reusable object.
3. Regex Syntax
a) Metacharacters
Metacharacter Description Example
. Matches any character except newline a.c matches "abc", "adc"
^ Matches start of string ^Hello matches "Hello"
$ Matches end of string world$ matches "world"
* Matches 0 or more occurrences ca*t matches "ct", "cat"
Metacharacter Description Example
+ Matches 1 or more occurrences ca+t matches "cat" only
? Matches 0 or 1 occurrence ca?t matches "ct", "cat"
{} Matches specific repetitions a{2,3} matches "aa", "aaa"
b) Special Sequences
Sequence Description Example
\d Matches any digit \d+ matches "123"
\D Matches any non-digit \D+ matches "abc"
\w Matches any word character (alphanumeric) \w+ matches "word123"
\W Matches any non-word character \W+ matches "@#$"
\s Matches whitespace \s+ matches spaces
\S Matches non-whitespace \S+ matches "word"
4. Examples
a) re.match()
import re
# Example 1: Match at the start
text = "hello world"
result = re.match(r'hello', text)
print(result.group() if result else "No match") # Output: hello
# Example 2: Fails to match in the middle
result = re.match(r'world', text)
print(result) # Output: None
b) re.search()
# Example 1: Search anywhere in the string
result = re.search(r'world', text)
print(result.group() if result else "Not found") # Output: world
c) re.findall()
# Example 1: Find all occurrences
text = "abc 123 def 456"
result = re.findall(r'\d+', text)
print(result) # Output: ['123', '456']
d) re.sub()
# Example 1: Replace digits with '#'
text = "abc 123 def 456"
result = re.sub(r'\d', '#', text)
print(result) # Output: abc ### def ###
e) Regex with Special Characters
# Matching email
email = "example@example.com"
pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
result = re.match(pattern, email)
print(result.group() if result else "Invalid email") # Output: example@example.com
5. Practical Examples
a) Validate a Phone Number
phone = "123-456-7890"
pattern = r'^\d{3}-\d{3}-\d{4}$'
if re.match(pattern, phone):
print("Valid phone number")
else:
print("Invalid phone number")
b) Extract URLs from Text
text = "Visit https://example.com and http://test.com."
pattern = r'https?://[a-zA-Z0-9.-]+'
urls = re.findall(pattern, text)
print(urls) # Output: ['https://example.com', 'http://test.com']
c) Password Validation
password = "StrongP@ssw0rd"
pattern = r'^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$'
if re.match(pattern, password):
print("Strong password")
else:
print("Weak password")
6. Compiling Patterns for Efficiency
If a pattern is reused, compile it for better performance:
compiled_pattern = re.compile(r'\d+')
text = "Numbers: 123, 456, 789"
matches = compiled_pattern.findall(text)
print(matches) # Output: ['123', '456', '789']