Categorical Accessor (cat)¶

The cat accessor in pandas provides methods and properties for working with categorical data. It allows you to inspect, modify, and manipulate the categories of a Categorical Series.

Overview¶

import pandas as pd

s = pd.Series(['low', 'medium', 'high', 'low'], dtype='category')

# Access categorical methods via .cat accessor
print(s.cat.categories)

Index(['high', 'low', 'medium'], dtype='object')

Prerequisites¶

The cat accessor only works with categorical dtype:

# String column - cat accessor NOT available
s = pd.Series(['a', 'b', 'c'])
# s.cat.categories  # AttributeError

# Convert to categorical first
s = s.astype('category')
print(s.cat.categories)  # Now works

Properties¶

categories¶

Returns the categories of the categorical.

s = pd.Series(['apple', 'banana', 'apple', 'cherry'], dtype='category')
print(s.cat.categories)

Index(['apple', 'banana', 'cherry'], dtype='object')

codes¶

Returns the integer codes representing each category.

s = pd.Series(['apple', 'banana', 'apple', 'cherry'], dtype='category')
print(s.cat.codes)

0    0
1    1
2    0
3    2
dtype: int8

The codes are integers that index into the categories array. This is how categorical data achieves memory efficiency.

ordered¶

Returns whether the categorical has an order.

s = pd.Series(['low', 'medium', 'high'], dtype='category')
print(s.cat.ordered)  # False

# Create ordered categorical
s_ordered = pd.Categorical(['low', 'medium', 'high'], 
                           categories=['low', 'medium', 'high'],
                           ordered=True)
print(pd.Series(s_ordered).cat.ordered)  # True

Category Management Methods¶

add_categories()¶

Add new categories.

s = pd.Series(['a', 'b', 'a'], dtype='category')
print(s.cat.categories)  # ['a', 'b']

s = s.cat.add_categories(['c', 'd'])
print(s.cat.categories)  # ['a', 'b', 'c', 'd']

Note: Adding categories doesn't add data values—it just expands the allowed categories.

remove_categories()¶

Remove categories (values become NaN).

s = pd.Series(['a', 'b', 'c', 'a'], dtype='category')
s = s.cat.remove_categories(['c'])
print(s)

0      a
1      b
2    NaN
3      a
dtype: category
Categories (2, object): ['a', 'b']

⚠️ Warning: Removing a category doesn't remove rows—it converts those values to NaN.

remove_unused_categories()¶

Remove categories that don't appear in the data.

s = pd.Series(['a', 'b', 'a'], dtype='category')
s = s.cat.add_categories(['c', 'd'])  # Add unused categories
print(s.cat.categories)  # ['a', 'b', 'c', 'd']

s = s.cat.remove_unused_categories()
print(s.cat.categories)  # ['a', 'b']

set_categories()¶

Set categories to a new list (replaces all).

s = pd.Series(['a', 'b', 'c'], dtype='category')
s = s.cat.set_categories(['a', 'b', 'c', 'd', 'e'])
print(s.cat.categories)

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

rename_categories()¶

Rename existing categories.

s = pd.Series(['a', 'b', 'c', 'a'], dtype='category')

# Using a dictionary
s = s.cat.rename_categories({'a': 'alpha', 'b': 'beta', 'c': 'gamma'})
print(s)

0    alpha
1     beta
2    gamma
3    alpha
dtype: category
Categories (3, object): ['alpha', 'beta', 'gamma']

# Using a function
s = pd.Series(['a', 'b', 'c'], dtype='category')
s = s.cat.rename_categories(lambda x: x.upper())
print(s.cat.categories)  # ['A', 'B', 'C']

reorder_categories()¶

Reorder categories (for ordered categoricals).

s = pd.Series(['low', 'medium', 'high'], dtype='category')
s = s.cat.reorder_categories(['low', 'medium', 'high'], ordered=True)
print(s.cat.categories)

Index(['low', 'medium', 'high'], dtype='object')

Ordering Methods¶

as_ordered()¶

Make the categorical ordered.

s = pd.Series(['low', 'medium', 'high'], dtype='category')
print(s.cat.ordered)  # False

s = s.cat.as_ordered()
print(s.cat.ordered)  # True

as_unordered()¶

Make the categorical unordered.

s = s.cat.as_unordered()
print(s.cat.ordered)  # False

Practical Examples¶

Stock Sector Analysis¶

import pandas as pd
import numpy as np

# Create stock data
np.random.seed(42)
sectors = ['Technology', 'Finance', 'Healthcare', 'Energy', 'Consumer']
df = pd.DataFrame({
    'ticker': [f'STOCK_{i}' for i in range(1000)],
    'sector': np.random.choice(sectors, 1000),
    'returns': np.random.randn(1000) * 0.02
})

# Convert to categorical
df['sector'] = df['sector'].astype('category')

# Check categories
print(df['sector'].cat.categories)

# Reorder for logical grouping
df['sector'] = df['sector'].cat.reorder_categories(
    ['Technology', 'Healthcare', 'Finance', 'Consumer', 'Energy']
)

# Group analysis is now faster
sector_returns = df.groupby('sector')['returns'].mean()
print(sector_returns)

Credit Rating Analysis¶

# Credit ratings have natural order
ratings = pd.Series(['BBB', 'AA', 'AAA', 'BB', 'A', 'BBB', 'AA'])

# Convert to ordered categorical
rating_order = ['BB', 'BBB', 'A', 'AA', 'AAA']
ratings = pd.Categorical(ratings, categories=rating_order, ordered=True)
ratings = pd.Series(ratings)

# Now comparisons work
print(ratings > 'BBB')

0    False
1     True
2     True
3    False
4     True
5    False
6     True
dtype: bool

# Filter investment grade (BBB and above)
investment_grade = ratings[ratings >= 'BBB']
print(investment_grade)

Survey Response Analysis¶

# Survey responses with natural order
responses = pd.Series([
    'Strongly Disagree', 'Disagree', 'Neutral', 
    'Agree', 'Strongly Agree', 'Agree', 'Neutral'
])

# Define order
response_order = [
    'Strongly Disagree', 'Disagree', 'Neutral', 
    'Agree', 'Strongly Agree'
]

responses = pd.Categorical(responses, categories=response_order, ordered=True)
responses = pd.Series(responses)

# Find positive responses
positive = responses[responses > 'Neutral']
print(positive)

Memory Comparison¶

import pandas as pd
import numpy as np

# Create large dataset
n = 1_000_000
categories = ['Cat_A', 'Cat_B', 'Cat_C', 'Cat_D', 'Cat_E']
data = np.random.choice(categories, n)

# As string (object)
s_string = pd.Series(data)
print(f"String memory: {s_string.memory_usage(deep=True) / 1e6:.2f} MB")

# As categorical
s_cat = pd.Series(data, dtype='category')
print(f"Categorical memory: {s_cat.memory_usage(deep=True) / 1e6:.2f} MB")

# Ratio
ratio = s_string.memory_usage(deep=True) / s_cat.memory_usage(deep=True)
print(f"Memory reduction: {ratio:.1f}x")

String memory: 57.00 MB
Categorical memory: 1.00 MB
Memory reduction: 57.0x

Summary of cat Methods¶

Method/Property	Description
`cat.categories`	Get/set categories
`cat.codes`	Integer codes for values
`cat.ordered`	Check if ordered
`cat.add_categories()`	Add new categories
`cat.remove_categories()`	Remove categories
`cat.remove_unused_categories()`	Remove unused categories
`cat.set_categories()`	Replace all categories
`cat.rename_categories()`	Rename categories
`cat.reorder_categories()`	Reorder categories
`cat.as_ordered()`	Make ordered
`cat.as_unordered()`	Make unordered