Skip to content

Unicode Comparison Strategies

Python provides three levels of string comparison, each appropriate for different situations.


Level 1: Exact Comparison

Use == when both strings are known to be in the same Unicode form and case matters:

s1 = "café"
s2 = "café"

print(s1 == s2)  # True

This is the fastest approach but fails silently if the strings come from different sources with different internal representations.


Level 2: Unicode-Aware Case-Sensitive Comparison

Use normalize() when strings may have different internal representations but case still matters:

from unicodedata import normalize

def nfc_equal(s1, s2):
    return normalize("NFC", s1) == normalize("NFC", s2)

print(nfc_equal("café", "cafe\u0301"))  # True
print(nfc_equal("Café", "café"))        # False  (case-sensitive)

Typical use cases: comparing database keys, filenames, identifiers.


Level 3: Unicode-Aware Case-Insensitive Comparison

Use normalize() combined with casefold() when strings may differ in both representation and case:

def fold_equal(s1, s2):
    return normalize("NFC", s1).casefold() == normalize("NFC", s2).casefold()

print(fold_equal("Café", "CAFÉ"))         # True
print(fold_equal("café", "cafe\u0301"))  # True
print(fold_equal("ß", "SS"))              # True

Typical use cases: user authentication, search queries, international names.


Summary

Level Approach Use When
1 s1 == s2 Strings are ASCII or already normalized
2 nfc_equal(s1, s2) Case matters, but representation may vary
3 fold_equal(s1, s2) User-facing input, case and representation may vary

Each level adds robustness at a small performance cost. For user-facing comparison, always prefer Level 3.


See Also