Character Classes¶

Mental Model

A character class is a single-character menu: [aeiou] means "match exactly one character, and it must be one of these." Ranges like [a-z] and shorthands like \d (digits) are just compact ways to define the menu. Negation ([^...]) flips the menu into "match anything except these."

What Is a Character Class?¶

A character class (also called a character set) matches one character from a defined set. Character classes are enclosed in square brackets [...].

```python import re

Match any vowel¶

re.findall(r'[aeiou]', 'Hello World')

['e', 'o', 'o']¶

Match any digit¶

re.findall(r'[0123456789]', 'Room 404')

['4', '0', '4']¶

```

Ranges¶

Use a hyphen - inside brackets to specify a range of characters:

```python import re

text = "Agent 007 has clearance Level-A3"

Digit range (equivalent to \d)¶

re.findall(r'[0-9]', text)

['0', '0', '7', '3']¶

Lowercase letters¶

re.findall(r'[a-z]+', text)

['gent', 'has', 'clearance', 'evel']¶

Uppercase letters¶

re.findall(r'[A-Z]+', text)

['A', 'L', 'A']¶

Letters and digits combined¶

re.findall(r'[a-zA-Z0-9]+', text)

['Agent', '007', 'has', 'clearance', 'Level', 'A3']¶

```

Multiple ranges can be combined in a single class:

```python

Hexadecimal digits¶

re.findall(r'[0-9a-fA-F]+', '0xFF 0x1A 255 0xGG')

['0', 'FF', '0', '1A', '255', '0', 'GG' won't match fully]¶

Actually:¶

re.findall(r'[0-9a-fA-F]+', '0xFF 0x1A 255 0xGG')

['0', 'FF', '0', '1A', '255', '0']¶

```

Negated Character Classes¶

A caret ^ at the beginning of a character class negates it — matching any character not in the set:

```python import re

Match non-digits¶

re.findall(r'[^0-9]+', 'Room 404 is on Floor 4')

['Room ', ' is on Floor ']¶

Match non-vowels¶

re.findall(r'[^aeiouAEIOU]+', 'Hello World')

['H', 'll', ' W', 'rld']¶

Match non-whitespace (similar to \S)¶

re.findall(r'[^ \t\n]+', 'hello world')

['hello', 'world']¶

```

Caret Position Matters

The ^ only negates when it appears as the first character inside [...]. Elsewhere, it matches a literal caret: [a^b] matches a, ^, or b.

Special Characters Inside Classes¶

Most metacharacters lose their special meaning inside character classes. Only a few remain special:

Character	Special inside `[...]`?	How to use literally
`]`	Yes — closes the class	`\]` or place first: `[]abc]`
`\`	Yes — escape character	`\\`
`^`	Yes — negation (only if first)	Place after first position: `[a^b]`
`-`	Yes — range operator	`\-` or place first/last: `[-abc]` or `[abc-]`

```python import re

Match literal special characters¶

re.findall(r'[[]]', 'array[0] = list[1]')

['[', ']', '[', ']']¶

Hyphen at end — matches literal hyphen¶

re.findall(r'[a-z-]+', 'well-known self-driving')

['well-known', 'self-driving']¶

Dot inside class — just a literal dot¶

re.findall(r'[.]', 'version 3.14')

['.']¶

```

Shorthand Classes vs Bracket Notation¶

The shorthand classes \d, \w, \s and their negations can be used inside character classes:

```python import re

Digits or hyphens (for phone numbers)¶

re.findall(r'[\d-]+', 'Call 555-123-4567 today')

['555-123-4567']¶

Word characters or dots (for filenames)¶

re.findall(r'[\w.]+', 'file_v2.py and data.csv')

['file_v2.py', 'and', 'data.csv']¶

Digits and whitespace¶

re.findall(r'[\d\s]+', 'score: 42 out of 50')

[' 42 ', ' 50']¶

```

POSIX-like Classes (Unicode)¶

Python's \d, \w, and \s match Unicode characters by default. Use the re.ASCII flag to restrict to ASCII:

```python import re

\d matches Unicode digits by default¶

re.findall(r'\d+', '123 ١٢٣ ୧୨୩')

['123', '١٢٣', '୧୨୩']¶

Restrict to ASCII digits¶

re.findall(r'\d+', '123 ١٢٣ ୧୨୩', re.ASCII)

['123']¶

```

Practical Examples¶

Matching Identifiers¶

A valid Python identifier starts with a letter or underscore, followed by letters, digits, or underscores:

```python import re

text = "x = 42; name = 'hello'; 3bad = True" re.findall(r'[a-zA-Z]\w*', text)

['x', '_name', 'hello', 'bad', 'True']¶

```

Extracting Vowels and Consonants¶

```python import re

word = "Mississippi" vowels = re.findall(r'[aeiouAEIOU]', word) consonants = re.findall(r'[^aeiouAEIOU]', word)

print(f"Vowels: {vowels}") # ['i', 'i', 'i', 'i'] print(f"Consonants: {consonants}") # ['M', 's', 's', 's', 's', 'p', 'p'] ```

Matching Hex Color Codes¶

```python import re

css = "color: #FF5733; background: #0a0; border: #12ab"

Full 6-digit or 3-digit hex codes¶

re.findall(r'#[0-9a-fA-F]{3,6}\b', css)

['#FF5733', '#0a0', '#12ab']¶

Strictly 6-digit or 3-digit¶

re.findall(r'#(?:[0-9a-fA-F]{6}|[0-9a-fA-F]{3})\b', css)

['#FF5733', '#0a0']¶

```

Summary¶

Concept	Key Takeaway
`[abc]`	Matches one character: `a`, `b`, or `c`
`[a-z]`	Matches one character in the range `a` to `z`
`[^abc]`	Matches one character not in the set
`-` in class	Range operator; literal if first or last
`^` in class	Negation only if first character
Metacharacters	Most lose special meaning inside `[...]`
`\d \w \s`	Can be used inside character classes

Runnable Example: `character_classes_tutorial.py`¶

```python """ Python Regular Expressions - Tutorial 02: Character Classes ===========================================================

LEARNING OBJECTIVES:¶

Understand character classes and their syntax
Use predefined character classes (\d, \w, \s, etc.)
Create custom character classes with [...]
Use character ranges [a-z], [0-9]
Understand negated character classes [^...]
Combine character classes for complex matching

PREREQUISITES:¶

Tutorial 01: Regex Basics
Understanding of re.match(), re.search(), re.findall()

DIFFICULTY: BEGINNER """

import re

==============================================================================¶

SECTION 1: INTRODUCTION TO CHARACTER CLASSES¶

==============================================================================¶

if name == "main":

"""
CHARACTER CLASSES allow you to match one character from a set of characters.
Instead of matching exact text, you can match "any digit" or "any letter".

Basic Syntax:
- [abc]   : Matches 'a', 'b', OR 'c' (any single character from the set)
- [^abc]  : Matches any character EXCEPT 'a', 'b', or 'c'
- [a-z]   : Matches any lowercase letter from 'a' to 'z'
- [0-9]   : Matches any digit from '0' to '9'

This is much more powerful than literal matching!
"""

print("="*70)
print("SECTION 1: BASIC CHARACTER CLASSES")
print("="*70)

# Example 1: Simple character class
# ---------------------------------
# Match a single vowel
pattern1 = r"[aeiou]"  # Matches any single vowel
text1 = "hello world"

vowels = re.findall(pattern1, text1)
print(f"Text: '{text1}'")
print(f"Pattern: '{pattern1}' (matches any vowel)")
print(f"Matches found: {vowels}")
print(f"Total vowels: {len(vowels)}")

print()

# Example 2: Matching specific characters
# ---------------------------------------
# Match only the letters 'c', 'a', 't'
pattern2 = r"[cat]"
text2 = "the cat sat on the mat"

matches = re.findall(pattern2, text2)
print(f"Text: '{text2}'")
print(f"Pattern: '{pattern2}' (matches 'c', 'a', or 't')")
print(f"Matches: {matches}")
print(f"Count: {len(matches)}")

print()

# ==============================================================================
# SECTION 2: CHARACTER RANGES
# ==============================================================================

"""
RANGES allow you to specify a sequence of characters without listing them all.

Common ranges:
- [a-z]   : All lowercase letters
- [A-Z]   : All uppercase letters
- [0-9]   : All digits
- [a-zA-Z]: All letters (upper and lower)
- [a-z0-9]: All lowercase letters and digits
"""

print("="*70)
print("SECTION 2: CHARACTER RANGES")
print("="*70)

# Example 3: Matching lowercase letters
# -------------------------------------
pattern3 = r"[a-z]"
text3 = "Hello World 123"

lowercase = re.findall(pattern3, text3)
print(f"Text: '{text3}'")
print(f"Pattern: '{pattern3}' (any lowercase letter)")
print(f"Lowercase letters found: {lowercase}")

print()

# Example 4: Matching digits
# --------------------------
pattern4 = r"[0-9]"
text4 = "Room 101, Floor 5, Building A"

digits = re.findall(pattern4, text4)
print(f"Text: '{text4}'")
print(f"Pattern: '{pattern4}' (any digit)")
print(f"Digits found: {digits}")

print()

# Example 5: Combining ranges
# ---------------------------
# Match any alphanumeric character (letter or digit)
pattern5 = r"[a-zA-Z0-9]"
text5 = "User123! #Password456"

alphanum = re.findall(pattern5, text5)
print(f"Text: '{text5}'")
print(f"Pattern: '{pattern5}' (any letter or digit)")
print(f"Alphanumeric characters: {alphanum}")
print(f"Total: {len(alphanum)}")

print()

# ==============================================================================
# SECTION 3: PREDEFINED CHARACTER CLASSES
# ==============================================================================

"""
Python regex provides SHORTHAND notations for common character classes:

\d  : Digit [0-9]
\D  : Non-digit [^0-9]
\w  : Word character [a-zA-Z0-9_] (letters, digits, underscore)
\W  : Non-word character [^a-zA-Z0-9_]
\s  : Whitespace [ \t\n\r\f\v] (space, tab, newline, etc.)
\S  : Non-whitespace [^ \t\n\r\f\v]

These are very commonly used and make patterns more readable.
"""

print("="*70)
print("SECTION 3: PREDEFINED CHARACTER CLASSES")
print("="*70)

# Example 6: Using \d for digits
# ------------------------------
pattern6 = r"\d"  # Equivalent to [0-9]
text6 = "I have 3 cats and 2 dogs"

digits = re.findall(pattern6, text6)
print(f"Text: '{text6}'")
print(f"Pattern: '\\d' (any digit)")
print(f"Digits: {digits}")

print()

# Example 7: Using \w for word characters
# ---------------------------------------
pattern7 = r"\w"  # Matches letters, digits, and underscore
text7 = "hello_world123!@#"

word_chars = re.findall(pattern7, text7)
print(f"Text: '{text7}'")
print(f"Pattern: '\\w' (word characters)")
print(f"Word characters: {word_chars}")

print()

# Example 8: Using \s for whitespace
# ----------------------------------
pattern8 = r"\s"  # Matches spaces, tabs, newlines
text8 = "hello\tworld\ntest"

spaces = re.findall(pattern8, text8)
print(f"Text: 'hello\\tworld\\ntest'")
print(f"Pattern: '\\s' (whitespace)")
print(f"Whitespace characters found: {len(spaces)}")
print(f"Types: {repr(spaces)}")  # repr() shows special characters

print()

# Example 9: Using \D, \W, \S (negated versions)
# ----------------------------------------------
text9 = "hello123"

# \D matches anything that's NOT a digit
non_digits = re.findall(r"\D", text9)
print(f"Text: '{text9}'")
print(f"Pattern: '\\D' (non-digits)")
print(f"Non-digit characters: {non_digits}")

# \W matches anything that's NOT a word character
text10 = "hello-world!"
non_word = re.findall(r"\W", text10)
print(f"\nText: '{text10}'")
print(f"Pattern: '\\W' (non-word chars)")
print(f"Non-word characters: {non_word}")

# \S matches anything that's NOT whitespace
text11 = "a b c"
non_space = re.findall(r"\S", text11)
print(f"\nText: '{text11}'")
print(f"Pattern: '\\S' (non-whitespace)")
print(f"Non-whitespace characters: {non_space}")

print()

# ==============================================================================
# SECTION 4: NEGATED CHARACTER CLASSES
# ==============================================================================

"""
NEGATED CHARACTER CLASSES match any character EXCEPT those in the class.
Syntax: [^characters]

The ^ symbol at the START of a character class means "not".
Note: This is different from ^ as an anchor (which we'll learn later).
"""

print("="*70)
print("SECTION 4: NEGATED CHARACTER CLASSES")
print("="*70)

# Example 10: Matching non-vowels
# -------------------------------
pattern10 = r"[^aeiou]"  # Matches anything that's NOT a vowel
text10 = "hello"

non_vowels = re.findall(pattern10, text10)
print(f"Text: '{text10}'")
print(f"Pattern: '[^aeiou]' (not a vowel)")
print(f"Non-vowels: {non_vowels}")

print()

# Example 11: Matching non-digits
# -------------------------------
pattern11 = r"[^0-9]"  # Same as \D
text11 = "Room 404"

non_digits_custom = re.findall(pattern11, text11)
print(f"Text: '{text11}'")
print(f"Pattern: '[^0-9]' (not a digit)")
print(f"Non-digits: {non_digits_custom}")

print()

# Example 12: Excluding specific characters
# -----------------------------------------
# Match any character except spaces and punctuation
pattern12 = r"[^., ]"  # Not period, comma, or space
text12 = "Hello, World. Test."

chars = re.findall(pattern12, text12)
print(f"Text: '{text12}'")
print(f"Pattern: '[^., ]' (not period, comma, or space)")
print(f"Characters: {chars}")

print()

# ==============================================================================
# SECTION 5: THE DOT (.) METACHARACTER
# ==============================================================================

"""
The DOT (.) is a special metacharacter that matches ANY character except newline.
It's like a wildcard - use it carefully!

. : Matches any single character (except \n by default)
"""

print("="*70)
print("SECTION 5: THE DOT METACHARACTER")
print("="*70)

# Example 13: Using dot to match any character
# --------------------------------------------
pattern13 = r"c.t"  # 'c', followed by ANY character, followed by 't'
text13 = "cat cut cot c t c9t"

matches = re.findall(pattern13, text13)
print(f"Text: '{text13}'")
print(f"Pattern: 'c.t' ('c' + any char + 't')")
print(f"Matches: {matches}")

print()

# Example 14: Dot doesn't match newline by default
# ------------------------------------------------
text14 = "hello\nworld"
pattern14 = r"hello.world"

match = re.search(pattern14, text14)
print(f"Text: 'hello\\nworld'")
print(f"Pattern: 'hello.world'")
print(f"Match found: {match is not None}")
print("(The dot doesn't match the newline by default)")

print()

# Example 15: Matching a literal dot
# ----------------------------------
# To match an actual period, escape it with backslash
pattern15 = r"\."  # Matches a literal period
text15 = "3.14 is pi"

periods = re.findall(pattern15, text15)
print(f"Text: '{text15}'")
print(f"Pattern: '\\.' (literal period)")
print(f"Periods found: {periods}")

print()

# ==============================================================================
# SECTION 6: PRACTICAL EXAMPLES
# ==============================================================================

print("="*70)
print("SECTION 6: PRACTICAL EXAMPLES")
print("="*70)

# Example 16: Extracting all words from text
# ------------------------------------------
text16 = "Hello, World! This is a test-123."

# Match sequences of word characters
words = re.findall(r"\w+", text16)  # \w+ means one or more word characters
print(f"Text: '{text16}'")
print(f"Words extracted: {words}")

print()

# Example 17: Finding phone number digits
# ---------------------------------------
text17 = "Call me at 555-1234 or 555-5678"

# Extract all digit sequences
numbers = re.findall(r"\d+", text17)  # \d+ means one or more digits
print(f"Text: '{text17}'")
print(f"Number sequences: {numbers}")

print()

# Example 18: Identifying non-alphanumeric characters
# ---------------------------------------------------
text18 = "email@domain.com"

# Find all characters that are not letters or digits
special_chars = re.findall(r"[^a-zA-Z0-9]", text18)
print(f"Text: '{text18}'")
print(f"Special characters: {special_chars}")

print()

# Example 19: Matching hex digits
# -------------------------------
# Hex digits are 0-9 and A-F (or a-f)
pattern19 = r"[0-9A-Fa-f]"
text19 = "Color: #FF5733, #00AAFF"

hex_digits = re.findall(pattern19, text19)
print(f"Text: '{text19}'")
print(f"Hex digits: {hex_digits}")
print(f"Total: {len(hex_digits)}")

print()

# Example 20: Validating single character input
# ---------------------------------------------
def validate_grade(grade):
    """
    Check if input is a valid letter grade (A, B, C, D, F).
    """
    # ^[ABCDF]$ would check if ENTIRE string is one of these letters
    # But we'll use match for simplicity here
    pattern = r"[ABCDF]"
    match = re.match(pattern, grade)
    return match is not None and len(grade) == 1

# Test the function
test_grades = ["A", "B", "C", "D", "F", "E", "Z", "AB"]
print("Grade validation:")
for grade in test_grades:
    result = "Valid" if validate_grade(grade) else "Invalid"
    print(f"  '{grade}': {result}")

print()

# ==============================================================================
# SECTION 7: COMBINING CHARACTER CLASSES
# ==============================================================================

print("="*70)
print("SECTION 7: COMBINING CHARACTER CLASSES")
print("="*70)

# Example 21: Complex character class
# -----------------------------------
# Match letters, digits, and specific symbols
pattern21 = r"[a-zA-Z0-9_\-.]"  # Letters, digits, underscore, hyphen, period
text21 = "user_name-123@domain.com"

valid_chars = re.findall(pattern21, text21)
print(f"Text: '{text21}'")
print(f"Pattern: '[a-zA-Z0-9_\\-.]'")
print(f"Valid characters: {valid_chars}")

print()

# Example 22: Using multiple character classes in one pattern
# ----------------------------------------------------------
# Match: digit, followed by any letter, followed by digit
pattern22 = r"\d[a-zA-Z]\d"
text22 = "Room 3A5, 4B7, and 2Z9"

matches = re.findall(pattern22, text22)
print(f"Text: '{text22}'")
print(f"Pattern: '\\d[a-zA-Z]\\d' (digit-letter-digit)")
print(f"Matches: {matches}")

print()

# ==============================================================================
# SECTION 8: COMMON MISTAKES TO AVOID
# ==============================================================================

print("="*70)
print("SECTION 8: COMMON MISTAKES TO AVOID")
print("="*70)

# Mistake 1: Forgetting to escape special characters in character class
# --------------------------------------------------------------------
print("Mistake 1: Special characters in character classes")

# If you want to match a literal hyphen in a character class,
# put it at the start or end, or escape it
pattern_wrong = r"[a-z-0-9]"  # This is interpreted as range 'a' to 'z-0' to '9'
pattern_right = r"[a-z0-9\-]"  # Escaped hyphen
pattern_right2 = r"[-a-z0-9]"  # Hyphen at start

text = "hello-world123"
print(f"Text: '{text}'")
print(f"Pattern '[a-z0-9\\-]': {re.findall(pattern_right, text)}")

print()

# Mistake 2: Confusing [^...] with \^
# -----------------------------------
print("Mistake 2: Understanding negation")
print("  [^abc] means: match anything EXCEPT a, b, or c")
print("  \\^ means: match a literal ^ character")

text = "^hello"
pattern_neg = r"[^h]"  # Matches anything except 'h'
pattern_literal = r"\^"  # Matches literal ^

print(f"Text: '{text}'")
print(f"[^h]: {re.findall(pattern_neg, text)}")
print(f"\\^: {re.findall(pattern_literal, text)}")

print()

# Mistake 3: Thinking . matches newline
# -------------------------------------
print("Mistake 3: The dot doesn't match newline by default")
text_multiline = "line1\nline2"
print(f"Text: 'line1\\nline2'")
print(f"Pattern '.+': {re.findall(r'.+', text_multiline)}")
print("(Each line matched separately, newline not included)")

print()

# ==============================================================================
# SECTION 9: SUMMARY AND CHEAT SHEET
# ==============================================================================

print("="*70)
print("SUMMARY: CHARACTER CLASS CHEAT SHEET")
print("="*70)

cheat_sheet = """
BASIC CHARACTER CLASSES:
  [abc]         Match 'a', 'b', or 'c'
  [a-z]         Match any lowercase letter
  [A-Z]         Match any uppercase letter
  [0-9]         Match any digit
  [a-zA-Z]      Match any letter
  [a-zA-Z0-9]   Match any letter or digit

NEGATED CLASSES:
  [^abc]        Match anything EXCEPT 'a', 'b', or 'c'
  [^0-9]        Match anything that's not a digit

PREDEFINED CLASSES:
  \\d           Digit [0-9]
  \\D           Non-digit [^0-9]
  \\w           Word character [a-zA-Z0-9_]
  \\W           Non-word character
  \\s           Whitespace (space, tab, newline, etc.)
  \\S           Non-whitespace
  .            Any character except newline

SPECIAL NOTES:
  - Always use raw strings: r"pattern"
  - Escape special chars in classes: [\\-\\.]
  - Put hyphen at start or end to match literally: [-abc] or [abc-]
  - ^ at START of class means negation: [^abc]
  - ^ elsewhere is literal: [a^bc]
"""

print(cheat_sheet)

# ==============================================================================
# PRACTICE EXERCISES
# ==============================================================================

print("="*70)
print("PRACTICE CHALLENGES")
print("="*70)

"""
Try these exercises:

1. Write a pattern to match all vowels (both upper and lowercase)
2. Extract all punctuation marks from a string
3. Find all hexadecimal numbers (0-9, A-F, a-f)
4. Match all characters that are NOT spaces or punctuation
5. Create a pattern to match DNA sequences (only A, T, G, C)
6. Extract all word characters followed by a digit
7. Match all characters except vowels

Solutions in exercises_01_basics.py
"""

# ==============================================================================
# END OF TUTORIAL 02
# ==============================================================================

print("\n" + "="*70)
print("END OF TUTORIAL - Character Classes mastered!")
print("Next: Tutorial 03 - Quantifiers")
print("="*70)

```

Exercises¶

Exercise 1. Write a regex pattern that matches any string containing only lowercase letters and digits (no spaces, uppercase, or special characters). Test it against "hello123", "Hello", "test!", and "abc".

Solution to Exercise 1

```python import re

pattern = r'^[a-z0-9]+$'

tests = ["hello123", "Hello", "test!", "abc"] for t in tests: match = re.fullmatch(pattern, t) print(f"'{t}': {'Match' if match else 'No match'}")

'hello123': Match¶

'Hello': No match¶

'test!': No match¶

'abc': Match¶

```

Exercise 2. Write a regex pattern using a negated character class to find all characters in a string that are NOT alphanumeric or whitespace. For example, in "Hello, World! #2024", it should find [',', '!', '#'].

Solution to Exercise 2

```python import re

text = "Hello, World! #2024" non_alnum = re.findall(r'[^\w\s]', text) print(non_alnum) # [',', '!', '#'] ```

Exercise 3. Write a regex pattern that matches a valid hex color code: # followed by exactly 6 hexadecimal characters (digits and letters a-f, case-insensitive). Test against "#FF5733", "#abc", "#12345G", and "#aabbcc".

Solution to Exercise 3

```python import re

pattern = r'^#[0-9a-fA-F]{6}$'

tests = ["#FF5733", "#abc", "#12345G", "#aabbcc"] for t in tests: match = re.fullmatch(pattern, t) print(f"'{t}': {'Valid' if match else 'Invalid'}")

'#FF5733': Valid¶

'#abc': Invalid (only 3 chars)¶

'#12345G': Invalid (G not hex)¶

'#aabbcc': Valid¶

```

Character Classes¶

What Is a Character Class?¶

Match any vowel¶

['e', 'o', 'o']¶

Match any digit¶

['4', '0', '4']¶

Ranges¶

Digit range (equivalent to \d)¶

['0', '0', '7', '3']¶

Lowercase letters¶

['gent', 'has', 'clearance', 'evel']¶

Uppercase letters¶

['A', 'L', 'A']¶

Letters and digits combined¶

['Agent', '007', 'has', 'clearance', 'Level', 'A3']¶

Hexadecimal digits¶

['0', 'FF', '0', '1A', '255', '0', 'GG' won't match fully]¶

Actually:¶

['0', 'FF', '0', '1A', '255', '0']¶

Negated Character Classes¶

Match non-digits¶

['Room ', ' is on Floor ']¶

Match non-vowels¶

['H', 'll', ' W', 'rld']¶

Match non-whitespace (similar to \S)¶

['hello', 'world']¶

Special Characters Inside Classes¶

Match literal special characters¶

['[', ']', '[', ']']¶

Hyphen at end — matches literal hyphen¶

['well-known', 'self-driving']¶

Dot inside class — just a literal dot¶

['.']¶

Shorthand Classes vs Bracket Notation¶

Digits or hyphens (for phone numbers)¶

['555-123-4567']¶

Word characters or dots (for filenames)¶

['file_v2.py', 'and', 'data.csv']¶

Digits and whitespace¶

[' 42 ', ' 50']¶

POSIX-like Classes (Unicode)¶

\d matches Unicode digits by default¶

['123', '١٢٣', '୧୨୩']¶

Restrict to ASCII digits¶

['123']¶

Practical Examples¶

Matching Identifiers¶

['x', '_name', 'hello', 'bad', 'True']¶

Extracting Vowels and Consonants¶

Matching Hex Color Codes¶

Full 6-digit or 3-digit hex codes¶

['#FF5733', '#0a0', '#12ab']¶

Strictly 6-digit or 3-digit¶

['#FF5733', '#0a0']¶

Summary¶

Runnable Example: character_classes_tutorial.py¶

LEARNING OBJECTIVES:¶

PREREQUISITES:¶

==============================================================================¶

SECTION 1: INTRODUCTION TO CHARACTER CLASSES¶

==============================================================================¶

Exercises¶

'hello123': Match¶

'Hello': No match¶

'test!': No match¶

'abc': Match¶

'#FF5733': Valid¶

'#abc': Invalid (only 3 chars)¶

'#12345G': Invalid (G not hex)¶

'#aabbcc': Valid¶

Runnable Example: `character_classes_tutorial.py`¶