Skip to content

Categorical Accessor (cat)

The cat accessor in pandas provides methods and properties for working with categorical data. It allows you to inspect, modify, and manipulate the categories of a Categorical Series.

Mental Model

Think of .cat as a control panel attached to any categorical Series. Just as a TV remote only works with a TV, .cat only activates on categorical dtype -- but once you have it, you can rename, reorder, add, or remove the category labels that define the column's vocabulary.

Overview

```python import pandas as pd

s = pd.Series(['low', 'medium', 'high', 'low'], dtype='category')

Access categorical methods via .cat accessor

print(s.cat.categories) ```

Index(['high', 'low', 'medium'], dtype='object')

Prerequisites

The cat accessor only works with categorical dtype:

```python

String column - cat accessor NOT available

s = pd.Series(['a', 'b', 'c'])

s.cat.categories # AttributeError

Convert to categorical first

s = s.astype('category') print(s.cat.categories) # Now works ```

Properties

categories

Returns the categories of the categorical.

python s = pd.Series(['apple', 'banana', 'apple', 'cherry'], dtype='category') print(s.cat.categories)

Index(['apple', 'banana', 'cherry'], dtype='object')

codes

Returns the integer codes representing each category.

python s = pd.Series(['apple', 'banana', 'apple', 'cherry'], dtype='category') print(s.cat.codes)

0 0 1 1 2 0 3 2 dtype: int8

The codes are integers that index into the categories array. This is how categorical data achieves memory efficiency.

ordered

Returns whether the categorical has an order.

```python s = pd.Series(['low', 'medium', 'high'], dtype='category') print(s.cat.ordered) # False

Create ordered categorical

s_ordered = pd.Categorical(['low', 'medium', 'high'], categories=['low', 'medium', 'high'], ordered=True) print(pd.Series(s_ordered).cat.ordered) # True ```

Category Management Methods

add_categories()

Add new categories.

```python s = pd.Series(['a', 'b', 'a'], dtype='category') print(s.cat.categories) # ['a', 'b']

s = s.cat.add_categories(['c', 'd']) print(s.cat.categories) # ['a', 'b', 'c', 'd'] ```

Note: Adding categories doesn't add data values—it just expands the allowed categories.

remove_categories()

Remove categories (values become NaN).

python s = pd.Series(['a', 'b', 'c', 'a'], dtype='category') s = s.cat.remove_categories(['c']) print(s)

0 a 1 b 2 NaN 3 a dtype: category Categories (2, object): ['a', 'b']

⚠️ Warning: Removing a category doesn't remove rows—it converts those values to NaN.

remove_unused_categories()

Remove categories that don't appear in the data.

```python s = pd.Series(['a', 'b', 'a'], dtype='category') s = s.cat.add_categories(['c', 'd']) # Add unused categories print(s.cat.categories) # ['a', 'b', 'c', 'd']

s = s.cat.remove_unused_categories() print(s.cat.categories) # ['a', 'b'] ```

set_categories()

Set categories to a new list (replaces all).

python s = pd.Series(['a', 'b', 'c'], dtype='category') s = s.cat.set_categories(['a', 'b', 'c', 'd', 'e']) print(s.cat.categories)

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

rename_categories()

Rename existing categories.

```python s = pd.Series(['a', 'b', 'c', 'a'], dtype='category')

Using a dictionary

s = s.cat.rename_categories({'a': 'alpha', 'b': 'beta', 'c': 'gamma'}) print(s) ```

0 alpha 1 beta 2 gamma 3 alpha dtype: category Categories (3, object): ['alpha', 'beta', 'gamma']

```python

Using a function

s = pd.Series(['a', 'b', 'c'], dtype='category') s = s.cat.rename_categories(lambda x: x.upper()) print(s.cat.categories) # ['A', 'B', 'C'] ```

reorder_categories()

Reorder categories (for ordered categoricals).

python s = pd.Series(['low', 'medium', 'high'], dtype='category') s = s.cat.reorder_categories(['low', 'medium', 'high'], ordered=True) print(s.cat.categories)

Index(['low', 'medium', 'high'], dtype='object')

Ordering Methods

as_ordered()

Make the categorical ordered.

```python s = pd.Series(['low', 'medium', 'high'], dtype='category') print(s.cat.ordered) # False

s = s.cat.as_ordered() print(s.cat.ordered) # True ```

as_unordered()

Make the categorical unordered.

python s = s.cat.as_unordered() print(s.cat.ordered) # False

Practical Examples

Stock Sector Analysis

```python import pandas as pd import numpy as np

Create stock data

np.random.seed(42) sectors = ['Technology', 'Finance', 'Healthcare', 'Energy', 'Consumer'] df = pd.DataFrame({ 'ticker': [f'STOCK_{i}' for i in range(1000)], 'sector': np.random.choice(sectors, 1000), 'returns': np.random.randn(1000) * 0.02 })

Convert to categorical

df['sector'] = df['sector'].astype('category')

Check categories

print(df['sector'].cat.categories)

Reorder for logical grouping

df['sector'] = df['sector'].cat.reorder_categories( ['Technology', 'Healthcare', 'Finance', 'Consumer', 'Energy'] )

Group analysis is now faster

sector_returns = df.groupby('sector')['returns'].mean() print(sector_returns) ```

Credit Rating Analysis

```python

Credit ratings have natural order

ratings = pd.Series(['BBB', 'AA', 'AAA', 'BB', 'A', 'BBB', 'AA'])

Convert to ordered categorical

rating_order = ['BB', 'BBB', 'A', 'AA', 'AAA'] ratings = pd.Categorical(ratings, categories=rating_order, ordered=True) ratings = pd.Series(ratings)

Now comparisons work

print(ratings > 'BBB') ```

0 False 1 True 2 True 3 False 4 True 5 False 6 True dtype: bool

```python

Filter investment grade (BBB and above)

investment_grade = ratings[ratings >= 'BBB'] print(investment_grade) ```

Survey Response Analysis

```python

Survey responses with natural order

responses = pd.Series([ 'Strongly Disagree', 'Disagree', 'Neutral', 'Agree', 'Strongly Agree', 'Agree', 'Neutral' ])

Define order

response_order = [ 'Strongly Disagree', 'Disagree', 'Neutral', 'Agree', 'Strongly Agree' ]

responses = pd.Categorical(responses, categories=response_order, ordered=True) responses = pd.Series(responses)

Find positive responses

positive = responses[responses > 'Neutral'] print(positive) ```

Memory Comparison

```python import pandas as pd import numpy as np

Create large dataset

n = 1_000_000 categories = ['Cat_A', 'Cat_B', 'Cat_C', 'Cat_D', 'Cat_E'] data = np.random.choice(categories, n)

As string (object)

s_string = pd.Series(data) print(f"String memory: {s_string.memory_usage(deep=True) / 1e6:.2f} MB")

As categorical

s_cat = pd.Series(data, dtype='category') print(f"Categorical memory: {s_cat.memory_usage(deep=True) / 1e6:.2f} MB")

Ratio

ratio = s_string.memory_usage(deep=True) / s_cat.memory_usage(deep=True) print(f"Memory reduction: {ratio:.1f}x") ```

String memory: 57.00 MB Categorical memory: 1.00 MB Memory reduction: 57.0x

Summary of cat Methods

Method/Property Description
cat.categories Get/set categories
cat.codes Integer codes for values
cat.ordered Check if ordered
cat.add_categories() Add new categories
cat.remove_categories() Remove categories
cat.remove_unused_categories() Remove unused categories
cat.set_categories() Replace all categories
cat.rename_categories() Rename categories
cat.reorder_categories() Reorder categories
cat.as_ordered() Make ordered
cat.as_unordered() Make unordered

Exercises

Exercise 1. Create a Series with values ['small', 'medium', 'large', 'medium', 'small'] and convert it to a categorical type. Use .cat.codes to print the integer codes and .cat.categories to print the category labels.

Solution to Exercise 1

Convert to categorical and inspect codes and categories.

import pandas as pd

s = pd.Series(['small', 'medium', 'large', 'medium', 'small'], dtype='category')
print("Codes:", s.cat.codes.tolist())
print("Categories:", s.cat.categories.tolist())

Exercise 2. Create an ordered categorical Series with the custom order ['bronze', 'silver', 'gold']. Add a new category 'platinum' using .cat.add_categories(). Then verify that 'platinum' appears in the categories even though no element has that value.

Solution to Exercise 2

Create ordered categorical and add a new category.

import pandas as pd
from pandas.api.types import CategoricalDtype

cat_type = CategoricalDtype(
    categories=['bronze', 'silver', 'gold'],
    ordered=True
)
s = pd.Series(['bronze', 'silver', 'gold', 'silver'], dtype=cat_type)
s = s.cat.add_categories('platinum')
print(s.cat.categories)  # ['bronze', 'silver', 'gold', 'platinum']
print('platinum' in s.cat.categories)  # True

Exercise 3. Given a categorical Series with categories ['A', 'B', 'C', 'D'] where category 'D' is never used, use .cat.remove_unused_categories() to clean it up. Then rename the remaining categories to ['Alpha', 'Beta', 'Gamma'] using .cat.rename_categories().

Solution to Exercise 3

Remove unused categories and rename the rest.

import pandas as pd

s = pd.Series(
    pd.Categorical(['A', 'B', 'C', 'A', 'B'],
                   categories=['A', 'B', 'C', 'D'])
)
s = s.cat.remove_unused_categories()
print("After removing unused:", s.cat.categories.tolist())  # ['A', 'B', 'C']
s = s.cat.rename_categories({'A': 'Alpha', 'B': 'Beta', 'C': 'Gamma'})
print(s)