Categorical Accessor (cat)¶
The cat accessor in pandas provides methods and properties for working with categorical data. It allows you to inspect, modify, and manipulate the categories of a Categorical Series.
Overview¶
import pandas as pd
s = pd.Series(['low', 'medium', 'high', 'low'], dtype='category')
# Access categorical methods via .cat accessor
print(s.cat.categories)
Index(['high', 'low', 'medium'], dtype='object')
Prerequisites¶
The cat accessor only works with categorical dtype:
# String column - cat accessor NOT available
s = pd.Series(['a', 'b', 'c'])
# s.cat.categories # AttributeError
# Convert to categorical first
s = s.astype('category')
print(s.cat.categories) # Now works
Properties¶
categories¶
Returns the categories of the categorical.
s = pd.Series(['apple', 'banana', 'apple', 'cherry'], dtype='category')
print(s.cat.categories)
Index(['apple', 'banana', 'cherry'], dtype='object')
codes¶
Returns the integer codes representing each category.
s = pd.Series(['apple', 'banana', 'apple', 'cherry'], dtype='category')
print(s.cat.codes)
0 0
1 1
2 0
3 2
dtype: int8
The codes are integers that index into the categories array. This is how categorical data achieves memory efficiency.
ordered¶
Returns whether the categorical has an order.
s = pd.Series(['low', 'medium', 'high'], dtype='category')
print(s.cat.ordered) # False
# Create ordered categorical
s_ordered = pd.Categorical(['low', 'medium', 'high'],
categories=['low', 'medium', 'high'],
ordered=True)
print(pd.Series(s_ordered).cat.ordered) # True
Category Management Methods¶
add_categories()¶
Add new categories.
s = pd.Series(['a', 'b', 'a'], dtype='category')
print(s.cat.categories) # ['a', 'b']
s = s.cat.add_categories(['c', 'd'])
print(s.cat.categories) # ['a', 'b', 'c', 'd']
Note: Adding categories doesn't add data values—it just expands the allowed categories.
remove_categories()¶
Remove categories (values become NaN).
s = pd.Series(['a', 'b', 'c', 'a'], dtype='category')
s = s.cat.remove_categories(['c'])
print(s)
0 a
1 b
2 NaN
3 a
dtype: category
Categories (2, object): ['a', 'b']
⚠️ Warning: Removing a category doesn't remove rows—it converts those values to NaN.
remove_unused_categories()¶
Remove categories that don't appear in the data.
s = pd.Series(['a', 'b', 'a'], dtype='category')
s = s.cat.add_categories(['c', 'd']) # Add unused categories
print(s.cat.categories) # ['a', 'b', 'c', 'd']
s = s.cat.remove_unused_categories()
print(s.cat.categories) # ['a', 'b']
set_categories()¶
Set categories to a new list (replaces all).
s = pd.Series(['a', 'b', 'c'], dtype='category')
s = s.cat.set_categories(['a', 'b', 'c', 'd', 'e'])
print(s.cat.categories)
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
rename_categories()¶
Rename existing categories.
s = pd.Series(['a', 'b', 'c', 'a'], dtype='category')
# Using a dictionary
s = s.cat.rename_categories({'a': 'alpha', 'b': 'beta', 'c': 'gamma'})
print(s)
0 alpha
1 beta
2 gamma
3 alpha
dtype: category
Categories (3, object): ['alpha', 'beta', 'gamma']
# Using a function
s = pd.Series(['a', 'b', 'c'], dtype='category')
s = s.cat.rename_categories(lambda x: x.upper())
print(s.cat.categories) # ['A', 'B', 'C']
reorder_categories()¶
Reorder categories (for ordered categoricals).
s = pd.Series(['low', 'medium', 'high'], dtype='category')
s = s.cat.reorder_categories(['low', 'medium', 'high'], ordered=True)
print(s.cat.categories)
Index(['low', 'medium', 'high'], dtype='object')
Ordering Methods¶
as_ordered()¶
Make the categorical ordered.
s = pd.Series(['low', 'medium', 'high'], dtype='category')
print(s.cat.ordered) # False
s = s.cat.as_ordered()
print(s.cat.ordered) # True
as_unordered()¶
Make the categorical unordered.
s = s.cat.as_unordered()
print(s.cat.ordered) # False
Practical Examples¶
Stock Sector Analysis¶
import pandas as pd
import numpy as np
# Create stock data
np.random.seed(42)
sectors = ['Technology', 'Finance', 'Healthcare', 'Energy', 'Consumer']
df = pd.DataFrame({
'ticker': [f'STOCK_{i}' for i in range(1000)],
'sector': np.random.choice(sectors, 1000),
'returns': np.random.randn(1000) * 0.02
})
# Convert to categorical
df['sector'] = df['sector'].astype('category')
# Check categories
print(df['sector'].cat.categories)
# Reorder for logical grouping
df['sector'] = df['sector'].cat.reorder_categories(
['Technology', 'Healthcare', 'Finance', 'Consumer', 'Energy']
)
# Group analysis is now faster
sector_returns = df.groupby('sector')['returns'].mean()
print(sector_returns)
Credit Rating Analysis¶
# Credit ratings have natural order
ratings = pd.Series(['BBB', 'AA', 'AAA', 'BB', 'A', 'BBB', 'AA'])
# Convert to ordered categorical
rating_order = ['BB', 'BBB', 'A', 'AA', 'AAA']
ratings = pd.Categorical(ratings, categories=rating_order, ordered=True)
ratings = pd.Series(ratings)
# Now comparisons work
print(ratings > 'BBB')
0 False
1 True
2 True
3 False
4 True
5 False
6 True
dtype: bool
# Filter investment grade (BBB and above)
investment_grade = ratings[ratings >= 'BBB']
print(investment_grade)
Survey Response Analysis¶
# Survey responses with natural order
responses = pd.Series([
'Strongly Disagree', 'Disagree', 'Neutral',
'Agree', 'Strongly Agree', 'Agree', 'Neutral'
])
# Define order
response_order = [
'Strongly Disagree', 'Disagree', 'Neutral',
'Agree', 'Strongly Agree'
]
responses = pd.Categorical(responses, categories=response_order, ordered=True)
responses = pd.Series(responses)
# Find positive responses
positive = responses[responses > 'Neutral']
print(positive)
Memory Comparison¶
import pandas as pd
import numpy as np
# Create large dataset
n = 1_000_000
categories = ['Cat_A', 'Cat_B', 'Cat_C', 'Cat_D', 'Cat_E']
data = np.random.choice(categories, n)
# As string (object)
s_string = pd.Series(data)
print(f"String memory: {s_string.memory_usage(deep=True) / 1e6:.2f} MB")
# As categorical
s_cat = pd.Series(data, dtype='category')
print(f"Categorical memory: {s_cat.memory_usage(deep=True) / 1e6:.2f} MB")
# Ratio
ratio = s_string.memory_usage(deep=True) / s_cat.memory_usage(deep=True)
print(f"Memory reduction: {ratio:.1f}x")
String memory: 57.00 MB
Categorical memory: 1.00 MB
Memory reduction: 57.0x
Summary of cat Methods¶
| Method/Property | Description |
|---|---|
cat.categories |
Get/set categories |
cat.codes |
Integer codes for values |
cat.ordered |
Check if ordered |
cat.add_categories() |
Add new categories |
cat.remove_categories() |
Remove categories |
cat.remove_unused_categories() |
Remove unused categories |
cat.set_categories() |
Replace all categories |
cat.rename_categories() |
Rename categories |
cat.reorder_categories() |
Reorder categories |
cat.as_ordered() |
Make ordered |
cat.as_unordered() |
Make unordered |