Unicode Comparison Strategies¶

Python provides three levels of string comparison, each appropriate for different situations.

Level 1: Exact Comparison¶

Use == when both strings are known to be in the same Unicode form and case matters:

s1 = "café"
s2 = "café"

print(s1 == s2)  # True

This is the fastest approach but fails silently if the strings come from different sources with different internal representations.

Level 2: Unicode-Aware Case-Sensitive Comparison¶

Use normalize() when strings may have different internal representations but case still matters:

from unicodedata import normalize

def nfc_equal(s1, s2):
    return normalize("NFC", s1) == normalize("NFC", s2)

print(nfc_equal("café", "cafe\u0301"))  # True
print(nfc_equal("Café", "café"))        # False  (case-sensitive)

Typical use cases: comparing database keys, filenames, identifiers.

Level 3: Unicode-Aware Case-Insensitive Comparison¶

Use normalize() combined with casefold() when strings may differ in both representation and case:

def fold_equal(s1, s2):
    return normalize("NFC", s1).casefold() == normalize("NFC", s2).casefold()

print(fold_equal("Café", "CAFÉ"))         # True
print(fold_equal("café", "cafe\u0301"))  # True
print(fold_equal("ß", "SS"))              # True

Typical use cases: user authentication, search queries, international names.

Summary¶

Level	Approach	Use When
1	`s1 == s2`	Strings are ASCII or already normalized
2	`nfc_equal(s1, s2)`	Case matters, but representation may vary
3	`fold_equal(s1, s2)`	User-facing input, case and representation may vary

Each level adds robustness at a small performance cost. For user-facing comparison, always prefer Level 3.

Unicode Comparison Strategies¶

Level 1: Exact Comparison¶

Level 2: Unicode-Aware Case-Sensitive Comparison¶

Level 3: Unicode-Aware Case-Insensitive Comparison¶

Summary¶

See Also¶