Skip to content

Ordered Categoricals

Ordered categoricals allow logical comparisons between categories. This is essential when categories have a natural hierarchy, such as ratings, sizes, or priority levels.

Mental Model

An unordered categorical is like a set of labels with no ranking -- "red" is not greater than "blue." Setting ordered=True and specifying an order turns the labels into a ranked scale, so comparisons like "medium" < "large" become meaningful. This is essential for any column that represents a natural hierarchy.

Why Ordered Matters

Without ordering, comparisons between categories are meaningless:

```python import pandas as pd

Unordered categorical

sizes = pd.Series(['medium', 'small', 'large'], dtype='category')

This comparison doesn't make sense

try: result = sizes > 'small' print(result) # May work but results are arbitrary except TypeError as e: print(f"Error: {e}") ```

With ordering, comparisons reflect the logical hierarchy:

```python

Ordered categorical

sizes = pd.Categorical( ['medium', 'small', 'large'], categories=['small', 'medium', 'large'], ordered=True ) sizes = pd.Series(sizes)

Now comparison is meaningful

print(sizes > 'small') ```

0 True # medium > small 1 False # small > small (False) 2 True # large > small dtype: bool

Creating Ordered Categoricals

Method 1: pd.Categorical with ordered=True

python cat = pd.Categorical( ['low', 'medium', 'high', 'low'], categories=['low', 'medium', 'high'], ordered=True ) s = pd.Series(cat) print(s)

0 low 1 medium 2 high 3 low dtype: category Categories (3, object): ['low' < 'medium' < 'high']

The < symbols indicate the order.

Method 2: CategoricalDtype with ordered=True

```python rating_dtype = pd.CategoricalDtype( categories=['D', 'C', 'B', 'A', 'S'], ordered=True )

grades = pd.Series(['B', 'A', 'C', 'S', 'A']).astype(rating_dtype) print(grades) ```

Method 3: Using cat.as_ordered()

Convert an existing unordered categorical to ordered:

```python s = pd.Series(['a', 'b', 'c'], dtype='category') print(f"Before: ordered={s.cat.ordered}")

First set category order, then make ordered

s = s.cat.reorder_categories(['a', 'b', 'c'], ordered=True) print(f"After: ordered={s.cat.ordered}") ```

Comparison Operations

Basic Comparisons

```python priorities = pd.Categorical( ['medium', 'high', 'low', 'critical', 'medium'], categories=['low', 'medium', 'high', 'critical'], ordered=True ) priorities = pd.Series(priorities, name='priority')

Greater than

print(priorities > 'medium')

False, True, False, True, False

Greater than or equal

print(priorities >= 'high')

False, True, False, True, False

Equal

print(priorities == 'medium')

True, False, False, False, True

Not equal

print(priorities != 'low')

True, True, False, True, True

```

Filtering with Comparisons

```python df = pd.DataFrame({ 'task': ['Task A', 'Task B', 'Task C', 'Task D', 'Task E'], 'priority': pd.Categorical( ['medium', 'high', 'low', 'critical', 'medium'], categories=['low', 'medium', 'high', 'critical'], ordered=True ) })

High priority and above

urgent = df[df['priority'] >= 'high'] print(urgent) ```

task priority 1 Task B high 3 Task D critical

Min, Max on Ordered Categoricals

python print(f"Minimum: {df['priority'].min()}") # low print(f"Maximum: {df['priority'].max()}") # critical

Practical Examples

Credit Ratings

```python

Credit rating scale (best to worst)

rating_categories = ['AAA', 'AA', 'A', 'BBB', 'BB', 'B', 'CCC', 'CC', 'C', 'D']

Note: reverse order so AAA is "greater than" lower ratings

rating_dtype = pd.CategoricalDtype( categories=rating_categories, ordered=True )

bonds = pd.DataFrame({ 'issuer': ['Company A', 'Company B', 'Company C', 'Company D', 'Company E'], 'rating': ['AA', 'BBB', 'A', 'BB', 'AAA'] })

bonds['rating'] = bonds['rating'].astype(rating_dtype)

Investment grade: BBB and above

investment_grade = bonds[bonds['rating'] >= 'BBB'] print("Investment Grade Bonds:") print(investment_grade)

High yield (junk): below BBB

high_yield = bonds[bonds['rating'] < 'BBB'] print("\nHigh Yield Bonds:") print(high_yield) ```

Survey Likert Scale

```python likert_categories = [ 'Strongly Disagree', 'Disagree', 'Neutral', 'Agree', 'Strongly Agree' ]

survey = pd.DataFrame({ 'respondent': [1, 2, 3, 4, 5, 6, 7, 8], 'satisfaction': pd.Categorical( ['Agree', 'Neutral', 'Strongly Agree', 'Disagree', 'Agree', 'Strongly Disagree', 'Neutral', 'Agree'], categories=likert_categories, ordered=True ) })

Positive responses (Agree and above)

positive = survey[survey['satisfaction'] >= 'Agree'] print(f"Positive responses: {len(positive)} ({len(positive)/len(survey)*100:.0f}%)")

Negative responses (Disagree and below)

negative = survey[survey['satisfaction'] <= 'Disagree'] print(f"Negative responses: {len(negative)} ({len(negative)/len(survey)*100:.0f}%)") ```

Size/Tier Classification

```python

Customer tiers

tier_dtype = pd.CategoricalDtype( categories=['Bronze', 'Silver', 'Gold', 'Platinum', 'Diamond'], ordered=True )

customers = pd.DataFrame({ 'customer_id': [1, 2, 3, 4, 5], 'tier': ['Gold', 'Bronze', 'Platinum', 'Silver', 'Gold'], 'spend': [5000, 500, 15000, 2000, 4500] })

customers['tier'] = customers['tier'].astype(tier_dtype)

Premium customers (Gold and above)

premium = customers[customers['tier'] >= 'Gold'] print("Premium customers:") print(premium)

Potential upgrades (Silver - one tier below Gold)

potential_upgrades = customers[customers['tier'] == 'Silver'] print("\nPotential upgrade candidates:") print(potential_upgrades) ```

Risk Levels

```python risk_dtype = pd.CategoricalDtype( categories=['Minimal', 'Low', 'Medium', 'High', 'Critical'], ordered=True )

incidents = pd.DataFrame({ 'incident_id': [1, 2, 3, 4, 5], 'risk_level': ['Medium', 'Critical', 'Low', 'High', 'Medium'], 'description': ['Issue A', 'Issue B', 'Issue C', 'Issue D', 'Issue E'] })

incidents['risk_level'] = incidents['risk_level'].astype(risk_dtype)

Escalation required (High and above)

escalate = incidents[incidents['risk_level'] >= 'High'] print("Requires escalation:") print(escalate) ```

Sorting with Ordered Categoricals

Ordered categoricals sort according to category order, not alphabetically:

```python sizes = pd.Categorical( ['medium', 'small', 'large', 'medium', 'small'], categories=['small', 'medium', 'large'], ordered=True ) df = pd.DataFrame({'size': sizes, 'value': [1, 2, 3, 4, 5]})

Sort by size (logical order, not alphabetical)

df_sorted = df.sort_values('size') print(df_sorted) ```

size value 1 small 2 4 small 5 0 medium 1 3 medium 4 2 large 3

Changing Order

reorder_categories()

```python s = pd.Series(['a', 'b', 'c'], dtype='category')

Set new order

s = s.cat.reorder_categories(['c', 'b', 'a'], ordered=True) print(s.cat.categories) # Index(['c', 'b', 'a'], dtype='object')

Now 'c' < 'b' < 'a'

print(s > 'b') # False, False, True ```

Reversing Order

```python

Original: small < medium < large

sizes = pd.Categorical( ['small', 'medium', 'large'], categories=['small', 'medium', 'large'], ordered=True )

Reverse: large < medium < small

sizes_reversed = sizes.reorder_categories( sizes.categories[::-1] ) print(sizes_reversed.categories) # ['large', 'medium', 'small'] ```

Caveats

Cannot Compare Different Category Orders

```python cat1 = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'], ordered=True) cat2 = pd.Categorical(['a', 'b'], categories=['c', 'b', 'a'], ordered=True)

cat1 == cat2 # ValueError: Categoricals can only be compared if categories are the same

```

Unordered to Ordered Requires Explicit Ordering

```python s = pd.Series(['a', 'b', 'c'], dtype='category')

s > 'a' # TypeError for unordered

Must set order first

s = s.cat.reorder_categories(['a', 'b', 'c'], ordered=True)

Now comparisons work

```

Summary

Operation Requires Ordered
==, != No
<, >, <=, >= Yes
min(), max() Yes (for meaningful results)
sort_values() No (but uses category order if ordered)
Filtering with comparison Yes

Exercises

Exercise 1. Create an ordered categorical with levels ['small', 'medium', 'large'] and demonstrate that comparison operators (<, >) work correctly.

Solution to Exercise 1

```python import pandas as pd

See page content for relevant API details

s = pd.Series(['a', 'b', 'c', 'a', 'b'], dtype='category') print(s) print(s.cat.categories) print(s.cat.codes) ```


Exercise 2. Explain the difference between ordered and unordered categoricals. What operations are only available for ordered categoricals?

Solution to Exercise 2

See the explanation in the main content of this page. The key concept involves understanding the categorical data type and its internal representation in Pandas.


Exercise 3. Write code that creates a Series of t-shirt sizes, converts it to an ordered categorical, and filters for all sizes greater than 'medium'.

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'col': np.random.choice(['A', 'B', 'C'], 1000)}) df['col'] = df['col'].astype('category') print(df.dtypes) print(df['col'].value_counts()) ```


Exercise 4. Create an ordered categorical and use .cat.set_categories() to reorder the levels. Show that the comparison behavior changes accordingly.

Solution to Exercise 4

```python import pandas as pd

s = pd.Categorical(['low', 'medium', 'high', 'low'], categories=['low', 'medium', 'high'], ordered=True) print(s) print(s > 'low') ```