search, match, findall¶
Overview¶
Python's re module provides several functions for finding patterns in text. The three most commonly used are search(), match(), and findall(), each with distinct behavior.
| Function | Searches Where | Returns | Use Case |
|---|---|---|---|
re.search() |
Anywhere in string | First Match or None |
Find first occurrence |
re.match() |
Beginning of string only | Match or None |
Validate string start |
re.fullmatch() |
Entire string | Match or None |
Validate entire string |
re.findall() |
Entire string | List of strings/tuples | Extract all occurrences |
re.finditer() |
Entire string | Iterator of Match objects |
Process matches one by one |
re.search()¶
re.search() scans the entire string and returns the first match:
import re
text = "Error 404: Page not found at 14:30"
# Finds the first sequence of digits
match = re.search(r'\d+', text)
print(match.group()) # '404'
print(match.span()) # (6, 9)
If no match exists, it returns None:
match = re.search(r'\d+', 'no numbers here')
print(match) # None
Common Pattern: Guard with if¶
import re
text = "Temperature: 72.5°F"
match = re.search(r'([\d.]+)°([FC])', text)
if match:
value = float(match.group(1))
unit = match.group(2)
print(f"{value} degrees {unit}") # 72.5 degrees F
Walrus Operator (Python 3.8+)¶
The walrus operator := combines the search and check in one expression:
import re
text = "Price: \$42.99"
if m := re.search(r'\$(\d+\.\d{2})', text):
print(f"Found price: {m.group(1)}") # Found price: 42.99
re.match()¶
re.match() checks for a match only at the beginning of the string:
import re
# Matches — pattern is at the start
re.match(r'\d+', '123abc')
# <re.Match object; span=(0, 3), match='123'>
# No match — digits are not at the start
re.match(r'\d+', 'abc123')
# None
match() vs search() with ^¶
re.match() is equivalent to re.search() with a ^ anchor:
import re
text = "hello world"
# These are equivalent
re.match(r'hello', text) # Match
re.search(r'^hello', text) # Match
# These differ
re.match(r'world', text) # None — not at start
re.search(r'world', text) # Match — found in string
When to Use match() vs search()
Use re.match() when you specifically need to validate the beginning of a string. Use re.search() for general-purpose pattern finding anywhere in the string. In practice, re.search() is more commonly used.
re.fullmatch()¶
re.fullmatch() requires the pattern to match the entire string (equivalent to anchoring with ^...$):
import re
# Validate that the entire string is a date
re.fullmatch(r'\d{4}-\d{2}-\d{2}', '2024-01-15')
# <re.Match object; match='2024-01-15'>
re.fullmatch(r'\d{4}-\d{2}-\d{2}', '2024-01-15 extra')
# None — extra text after the date
fullmatch() is ideal for input validation:
import re
def is_valid_email_simple(email):
"""Basic email format check (not production-grade)."""
return bool(re.fullmatch(r'[\w.+-]+@[\w-]+\.[\w.]+', email))
print(is_valid_email_simple("user@example.com")) # True
print(is_valid_email_simple("not an email")) # False
print(is_valid_email_simple("user@example.com foo")) # False
re.findall()¶
re.findall() returns a list of all non-overlapping matches:
import re
text = "Prices: \$10, \$25, $100, and \$3.50"
# No groups — returns list of full matches
re.findall(r'\$[\d.]+', text)
# ['\$10', '\$25', '\$100', '\$3.50']
# One group — returns list of group contents
re.findall(r'\$([\d.]+)', text)
# ['10', '25', '100', '3.50']
# Multiple groups — returns list of tuples
re.findall(r'\$(\d+)\.?(\d*)', text)
# [('10', ''), ('25', ''), ('100', ''), ('3', '50')]
findall() with No Match¶
If no matches are found, findall() returns an empty list (not None):
result = re.findall(r'\d+', 'no numbers')
print(result) # []
print(len(result)) # 0
print(bool(result)) # False
re.finditer()¶
re.finditer() returns an iterator of Match objects, giving you access to all match metadata (position, groups):
import re
text = "Alice: 85, Bob: 92, Carol: 78"
for match in re.finditer(r'(\w+): (\d+)', text):
name = match.group(1)
score = int(match.group(2))
pos = match.span()
print(f"{name} scored {score} (at position {pos})")
# Alice scored 85 (at position (0, 9))
# Bob scored 92 (at position (11, 17))
# Carol scored 78 (at position (19, 28))
finditer() vs findall()¶
Use finditer() when you need:
- The position of each match (
.start(),.end(),.span()) - Named groups (
.groupdict()) - Memory efficiency with large texts (lazy iteration)
- Both the full match and group contents
import re
text = "2024-01-15 and 2024-12-31"
# findall — only group contents
re.findall(r'(?P<y>\d{4})-(?P<m>\d{2})-(?P<d>\d{2})', text)
# [('2024', '01', '15'), ('2024', '12', '31')]
# finditer — full Match objects
for m in re.finditer(r'(?P<y>\d{4})-(?P<m>\d{2})-(?P<d>\d{2})', text):
print(m.group(0), m.groupdict())
# 2024-01-15 {'y': '2024', 'm': '01', 'd': '15'}
# 2024-12-31 {'y': '2024', 'm': '12', 'd': '31'}
Comparison Table¶
import re
text = "cat bat hat"
pattern = r'[cbh]at'
# search — first match only
re.search(pattern, text).group() # 'cat'
# findall — all matches as list
re.findall(pattern, text) # ['cat', 'bat', 'hat']
# finditer — all matches as Match objects
[m.group() for m in re.finditer(pattern, text)] # ['cat', 'bat', 'hat']
# match — beginning of string only
re.match(pattern, text).group() # 'cat'
# fullmatch — entire string
re.fullmatch(pattern, text) # None (text has spaces)
re.fullmatch(pattern, 'cat') # <re.Match ...>
Summary¶
| Function | Scope | Returns | Best For |
|---|---|---|---|
search() |
First match anywhere | Match / None |
Finding first occurrence |
match() |
Start of string | Match / None |
Validating beginning |
fullmatch() |
Entire string | Match / None |
Input validation |
findall() |
All matches | list |
Extracting all occurrences |
finditer() |
All matches | Iterator of Match |
Position-aware extraction |