Ordered Categoricals¶

Ordered categoricals allow logical comparisons between categories. This is essential when categories have a natural hierarchy, such as ratings, sizes, or priority levels.

Why Ordered Matters¶

Without ordering, comparisons between categories are meaningless:

import pandas as pd

# Unordered categorical
sizes = pd.Series(['medium', 'small', 'large'], dtype='category')

# This comparison doesn't make sense
try:
    result = sizes > 'small'
    print(result)  # May work but results are arbitrary
except TypeError as e:
    print(f"Error: {e}")

With ordering, comparisons reflect the logical hierarchy:

# Ordered categorical
sizes = pd.Categorical(
    ['medium', 'small', 'large'],
    categories=['small', 'medium', 'large'],
    ordered=True
)
sizes = pd.Series(sizes)

# Now comparison is meaningful
print(sizes > 'small')

0     True   # medium > small
1    False   # small > small (False)
2     True   # large > small
dtype: bool

Creating Ordered Categoricals¶

Method 1: pd.Categorical with ordered=True¶

cat = pd.Categorical(
    ['low', 'medium', 'high', 'low'],
    categories=['low', 'medium', 'high'],
    ordered=True
)
s = pd.Series(cat)
print(s)

0       low
1    medium
2      high
3       low
dtype: category
Categories (3, object): ['low' < 'medium' < 'high']

The < symbols indicate the order.

Method 2: CategoricalDtype with ordered=True¶

rating_dtype = pd.CategoricalDtype(
    categories=['D', 'C', 'B', 'A', 'S'],
    ordered=True
)

grades = pd.Series(['B', 'A', 'C', 'S', 'A']).astype(rating_dtype)
print(grades)

Method 3: Using cat.as_ordered()¶

Convert an existing unordered categorical to ordered:

s = pd.Series(['a', 'b', 'c'], dtype='category')
print(f"Before: ordered={s.cat.ordered}")

# First set category order, then make ordered
s = s.cat.reorder_categories(['a', 'b', 'c'], ordered=True)
print(f"After: ordered={s.cat.ordered}")

Comparison Operations¶

Basic Comparisons¶

priorities = pd.Categorical(
    ['medium', 'high', 'low', 'critical', 'medium'],
    categories=['low', 'medium', 'high', 'critical'],
    ordered=True
)
priorities = pd.Series(priorities, name='priority')

# Greater than
print(priorities > 'medium')
# False, True, False, True, False

# Greater than or equal
print(priorities >= 'high')
# False, True, False, True, False

# Equal
print(priorities == 'medium')
# True, False, False, False, True

# Not equal
print(priorities != 'low')
# True, True, False, True, True

Filtering with Comparisons¶

df = pd.DataFrame({
    'task': ['Task A', 'Task B', 'Task C', 'Task D', 'Task E'],
    'priority': pd.Categorical(
        ['medium', 'high', 'low', 'critical', 'medium'],
        categories=['low', 'medium', 'high', 'critical'],
        ordered=True
    )
})

# High priority and above
urgent = df[df['priority'] >= 'high']
print(urgent)

     task  priority
1  Task B      high
3  Task D  critical

Min, Max on Ordered Categoricals¶

print(f"Minimum: {df['priority'].min()}")  # low
print(f"Maximum: {df['priority'].max()}")  # critical

Practical Examples¶

Credit Ratings¶

# Credit rating scale (best to worst)
rating_categories = ['AAA', 'AA', 'A', 'BBB', 'BB', 'B', 'CCC', 'CC', 'C', 'D']

# Note: reverse order so AAA is "greater than" lower ratings
rating_dtype = pd.CategoricalDtype(
    categories=rating_categories,
    ordered=True
)

bonds = pd.DataFrame({
    'issuer': ['Company A', 'Company B', 'Company C', 'Company D', 'Company E'],
    'rating': ['AA', 'BBB', 'A', 'BB', 'AAA']
})

bonds['rating'] = bonds['rating'].astype(rating_dtype)

# Investment grade: BBB and above
investment_grade = bonds[bonds['rating'] >= 'BBB']
print("Investment Grade Bonds:")
print(investment_grade)

# High yield (junk): below BBB
high_yield = bonds[bonds['rating'] < 'BBB']
print("\nHigh Yield Bonds:")
print(high_yield)

Survey Likert Scale¶

likert_categories = [
    'Strongly Disagree',
    'Disagree',
    'Neutral',
    'Agree',
    'Strongly Agree'
]

survey = pd.DataFrame({
    'respondent': [1, 2, 3, 4, 5, 6, 7, 8],
    'satisfaction': pd.Categorical(
        ['Agree', 'Neutral', 'Strongly Agree', 'Disagree',
         'Agree', 'Strongly Disagree', 'Neutral', 'Agree'],
        categories=likert_categories,
        ordered=True
    )
})

# Positive responses (Agree and above)
positive = survey[survey['satisfaction'] >= 'Agree']
print(f"Positive responses: {len(positive)} ({len(positive)/len(survey)*100:.0f}%)")

# Negative responses (Disagree and below)
negative = survey[survey['satisfaction'] <= 'Disagree']
print(f"Negative responses: {len(negative)} ({len(negative)/len(survey)*100:.0f}%)")

Size/Tier Classification¶

# Customer tiers
tier_dtype = pd.CategoricalDtype(
    categories=['Bronze', 'Silver', 'Gold', 'Platinum', 'Diamond'],
    ordered=True
)

customers = pd.DataFrame({
    'customer_id': [1, 2, 3, 4, 5],
    'tier': ['Gold', 'Bronze', 'Platinum', 'Silver', 'Gold'],
    'spend': [5000, 500, 15000, 2000, 4500]
})

customers['tier'] = customers['tier'].astype(tier_dtype)

# Premium customers (Gold and above)
premium = customers[customers['tier'] >= 'Gold']
print("Premium customers:")
print(premium)

# Potential upgrades (Silver - one tier below Gold)
potential_upgrades = customers[customers['tier'] == 'Silver']
print("\nPotential upgrade candidates:")
print(potential_upgrades)

Risk Levels¶

risk_dtype = pd.CategoricalDtype(
    categories=['Minimal', 'Low', 'Medium', 'High', 'Critical'],
    ordered=True
)

incidents = pd.DataFrame({
    'incident_id': [1, 2, 3, 4, 5],
    'risk_level': ['Medium', 'Critical', 'Low', 'High', 'Medium'],
    'description': ['Issue A', 'Issue B', 'Issue C', 'Issue D', 'Issue E']
})

incidents['risk_level'] = incidents['risk_level'].astype(risk_dtype)

# Escalation required (High and above)
escalate = incidents[incidents['risk_level'] >= 'High']
print("Requires escalation:")
print(escalate)

Sorting with Ordered Categoricals¶

Ordered categoricals sort according to category order, not alphabetically:

sizes = pd.Categorical(
    ['medium', 'small', 'large', 'medium', 'small'],
    categories=['small', 'medium', 'large'],
    ordered=True
)
df = pd.DataFrame({'size': sizes, 'value': [1, 2, 3, 4, 5]})

# Sort by size (logical order, not alphabetical)
df_sorted = df.sort_values('size')
print(df_sorted)

     size  value
1   small      2
4   small      5
0  medium      1
3  medium      4
2   large      3

Changing Order¶

reorder_categories()¶

s = pd.Series(['a', 'b', 'c'], dtype='category')

# Set new order
s = s.cat.reorder_categories(['c', 'b', 'a'], ordered=True)
print(s.cat.categories)  # Index(['c', 'b', 'a'], dtype='object')

# Now 'c' < 'b' < 'a'
print(s > 'b')  # False, False, True

Reversing Order¶

# Original: small < medium < large
sizes = pd.Categorical(
    ['small', 'medium', 'large'],
    categories=['small', 'medium', 'large'],
    ordered=True
)

# Reverse: large < medium < small
sizes_reversed = sizes.reorder_categories(
    sizes.categories[::-1]
)
print(sizes_reversed.categories)  # ['large', 'medium', 'small']

Caveats¶

Cannot Compare Different Category Orders¶

cat1 = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'], ordered=True)
cat2 = pd.Categorical(['a', 'b'], categories=['c', 'b', 'a'], ordered=True)

# cat1 == cat2  # ValueError: Categoricals can only be compared if categories are the same

Unordered to Ordered Requires Explicit Ordering¶

s = pd.Series(['a', 'b', 'c'], dtype='category')
# s > 'a'  # TypeError for unordered

# Must set order first
s = s.cat.reorder_categories(['a', 'b', 'c'], ordered=True)
# Now comparisons work

Summary¶

Operation	Requires Ordered
`==`, `!=`	No
`<`, `>`, `<=`, `>=`	Yes
`min()`, `max()`	Yes (for meaningful results)
`sort_values()`	No (but uses category order if ordered)
Filtering with comparison	Yes