Ordered Categoricals¶
Ordered categoricals allow logical comparisons between categories. This is essential when categories have a natural hierarchy, such as ratings, sizes, or priority levels.
Why Ordered Matters¶
Without ordering, comparisons between categories are meaningless:
import pandas as pd
# Unordered categorical
sizes = pd.Series(['medium', 'small', 'large'], dtype='category')
# This comparison doesn't make sense
try:
result = sizes > 'small'
print(result) # May work but results are arbitrary
except TypeError as e:
print(f"Error: {e}")
With ordering, comparisons reflect the logical hierarchy:
# Ordered categorical
sizes = pd.Categorical(
['medium', 'small', 'large'],
categories=['small', 'medium', 'large'],
ordered=True
)
sizes = pd.Series(sizes)
# Now comparison is meaningful
print(sizes > 'small')
0 True # medium > small
1 False # small > small (False)
2 True # large > small
dtype: bool
Creating Ordered Categoricals¶
Method 1: pd.Categorical with ordered=True¶
cat = pd.Categorical(
['low', 'medium', 'high', 'low'],
categories=['low', 'medium', 'high'],
ordered=True
)
s = pd.Series(cat)
print(s)
0 low
1 medium
2 high
3 low
dtype: category
Categories (3, object): ['low' < 'medium' < 'high']
The < symbols indicate the order.
Method 2: CategoricalDtype with ordered=True¶
rating_dtype = pd.CategoricalDtype(
categories=['D', 'C', 'B', 'A', 'S'],
ordered=True
)
grades = pd.Series(['B', 'A', 'C', 'S', 'A']).astype(rating_dtype)
print(grades)
Method 3: Using cat.as_ordered()¶
Convert an existing unordered categorical to ordered:
s = pd.Series(['a', 'b', 'c'], dtype='category')
print(f"Before: ordered={s.cat.ordered}")
# First set category order, then make ordered
s = s.cat.reorder_categories(['a', 'b', 'c'], ordered=True)
print(f"After: ordered={s.cat.ordered}")
Comparison Operations¶
Basic Comparisons¶
priorities = pd.Categorical(
['medium', 'high', 'low', 'critical', 'medium'],
categories=['low', 'medium', 'high', 'critical'],
ordered=True
)
priorities = pd.Series(priorities, name='priority')
# Greater than
print(priorities > 'medium')
# False, True, False, True, False
# Greater than or equal
print(priorities >= 'high')
# False, True, False, True, False
# Equal
print(priorities == 'medium')
# True, False, False, False, True
# Not equal
print(priorities != 'low')
# True, True, False, True, True
Filtering with Comparisons¶
df = pd.DataFrame({
'task': ['Task A', 'Task B', 'Task C', 'Task D', 'Task E'],
'priority': pd.Categorical(
['medium', 'high', 'low', 'critical', 'medium'],
categories=['low', 'medium', 'high', 'critical'],
ordered=True
)
})
# High priority and above
urgent = df[df['priority'] >= 'high']
print(urgent)
task priority
1 Task B high
3 Task D critical
Min, Max on Ordered Categoricals¶
print(f"Minimum: {df['priority'].min()}") # low
print(f"Maximum: {df['priority'].max()}") # critical
Practical Examples¶
Credit Ratings¶
# Credit rating scale (best to worst)
rating_categories = ['AAA', 'AA', 'A', 'BBB', 'BB', 'B', 'CCC', 'CC', 'C', 'D']
# Note: reverse order so AAA is "greater than" lower ratings
rating_dtype = pd.CategoricalDtype(
categories=rating_categories,
ordered=True
)
bonds = pd.DataFrame({
'issuer': ['Company A', 'Company B', 'Company C', 'Company D', 'Company E'],
'rating': ['AA', 'BBB', 'A', 'BB', 'AAA']
})
bonds['rating'] = bonds['rating'].astype(rating_dtype)
# Investment grade: BBB and above
investment_grade = bonds[bonds['rating'] >= 'BBB']
print("Investment Grade Bonds:")
print(investment_grade)
# High yield (junk): below BBB
high_yield = bonds[bonds['rating'] < 'BBB']
print("\nHigh Yield Bonds:")
print(high_yield)
Survey Likert Scale¶
likert_categories = [
'Strongly Disagree',
'Disagree',
'Neutral',
'Agree',
'Strongly Agree'
]
survey = pd.DataFrame({
'respondent': [1, 2, 3, 4, 5, 6, 7, 8],
'satisfaction': pd.Categorical(
['Agree', 'Neutral', 'Strongly Agree', 'Disagree',
'Agree', 'Strongly Disagree', 'Neutral', 'Agree'],
categories=likert_categories,
ordered=True
)
})
# Positive responses (Agree and above)
positive = survey[survey['satisfaction'] >= 'Agree']
print(f"Positive responses: {len(positive)} ({len(positive)/len(survey)*100:.0f}%)")
# Negative responses (Disagree and below)
negative = survey[survey['satisfaction'] <= 'Disagree']
print(f"Negative responses: {len(negative)} ({len(negative)/len(survey)*100:.0f}%)")
Size/Tier Classification¶
# Customer tiers
tier_dtype = pd.CategoricalDtype(
categories=['Bronze', 'Silver', 'Gold', 'Platinum', 'Diamond'],
ordered=True
)
customers = pd.DataFrame({
'customer_id': [1, 2, 3, 4, 5],
'tier': ['Gold', 'Bronze', 'Platinum', 'Silver', 'Gold'],
'spend': [5000, 500, 15000, 2000, 4500]
})
customers['tier'] = customers['tier'].astype(tier_dtype)
# Premium customers (Gold and above)
premium = customers[customers['tier'] >= 'Gold']
print("Premium customers:")
print(premium)
# Potential upgrades (Silver - one tier below Gold)
potential_upgrades = customers[customers['tier'] == 'Silver']
print("\nPotential upgrade candidates:")
print(potential_upgrades)
Risk Levels¶
risk_dtype = pd.CategoricalDtype(
categories=['Minimal', 'Low', 'Medium', 'High', 'Critical'],
ordered=True
)
incidents = pd.DataFrame({
'incident_id': [1, 2, 3, 4, 5],
'risk_level': ['Medium', 'Critical', 'Low', 'High', 'Medium'],
'description': ['Issue A', 'Issue B', 'Issue C', 'Issue D', 'Issue E']
})
incidents['risk_level'] = incidents['risk_level'].astype(risk_dtype)
# Escalation required (High and above)
escalate = incidents[incidents['risk_level'] >= 'High']
print("Requires escalation:")
print(escalate)
Sorting with Ordered Categoricals¶
Ordered categoricals sort according to category order, not alphabetically:
sizes = pd.Categorical(
['medium', 'small', 'large', 'medium', 'small'],
categories=['small', 'medium', 'large'],
ordered=True
)
df = pd.DataFrame({'size': sizes, 'value': [1, 2, 3, 4, 5]})
# Sort by size (logical order, not alphabetical)
df_sorted = df.sort_values('size')
print(df_sorted)
size value
1 small 2
4 small 5
0 medium 1
3 medium 4
2 large 3
Changing Order¶
reorder_categories()¶
s = pd.Series(['a', 'b', 'c'], dtype='category')
# Set new order
s = s.cat.reorder_categories(['c', 'b', 'a'], ordered=True)
print(s.cat.categories) # Index(['c', 'b', 'a'], dtype='object')
# Now 'c' < 'b' < 'a'
print(s > 'b') # False, False, True
Reversing Order¶
# Original: small < medium < large
sizes = pd.Categorical(
['small', 'medium', 'large'],
categories=['small', 'medium', 'large'],
ordered=True
)
# Reverse: large < medium < small
sizes_reversed = sizes.reorder_categories(
sizes.categories[::-1]
)
print(sizes_reversed.categories) # ['large', 'medium', 'small']
Caveats¶
Cannot Compare Different Category Orders¶
cat1 = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'], ordered=True)
cat2 = pd.Categorical(['a', 'b'], categories=['c', 'b', 'a'], ordered=True)
# cat1 == cat2 # ValueError: Categoricals can only be compared if categories are the same
Unordered to Ordered Requires Explicit Ordering¶
s = pd.Series(['a', 'b', 'c'], dtype='category')
# s > 'a' # TypeError for unordered
# Must set order first
s = s.cat.reorder_categories(['a', 'b', 'c'], ordered=True)
# Now comparisons work
Summary¶
| Operation | Requires Ordered |
|---|---|
==, != |
No |
<, >, <=, >= |
Yes |
min(), max() |
Yes (for meaningful results) |
sort_values() |
No (but uses category order if ordered) |
| Filtering with comparison | Yes |