String Methods Reference¶

Complete reference for all string methods available through the pandas str accessor.

Mental Model

This page is a lookup table: given a string task (change case, find a substring, split, pad), find the matching .str method. Every method here mirrors a built-in Python string method but operates on an entire Series at once, with automatic NaN handling baked in.

Case Methods¶

Method	Description	Example
`str.lower()`	Convert to lowercase	`'HELLO'` → `'hello'`
`str.upper()`	Convert to uppercase	`'hello'` → `'HELLO'`
`str.title()`	Titlecase (capitalize each word)	`'hello world'` → `'Hello World'`
`str.capitalize()`	Capitalize first character	`'hello'` → `'Hello'`
`str.swapcase()`	Swap case	`'Hello'` → `'hELLO'`
`str.casefold()`	Aggressive lowercase (for caseless matching)	`'STRASSE'` → `'strasse'`

```python import pandas as pd

s = pd.Series(['hello WORLD', 'PYTHON pandas'])

print(s.str.lower()) # hello world, python pandas print(s.str.upper()) # HELLO WORLD, PYTHON PANDAS print(s.str.title()) # Hello World, Python Pandas print(s.str.capitalize()) # Hello world, Python pandas print(s.str.swapcase()) # HELLO world, python PANDAS ```

Alignment Methods¶

Method	Description	Parameters
`str.center(width)`	Center align	`width`, `fillchar=' '`
`str.ljust(width)`	Left align	`width`, `fillchar=' '`
`str.rjust(width)`	Right align	`width`, `fillchar=' '`
`str.zfill(width)`	Pad with zeros on left	`width`
`str.pad(width)`	Pad string	`width`, `side='left'`, `fillchar=' '`

```python s = pd.Series(['a', 'bb', 'ccc'])

print(s.str.center(5, '')) # __a__, _bb__, _ccc print(s.str.ljust(5, '')) # a____, bb___, ccc__ print(s.str.rjust(5, '')) # _a, bb, __ccc print(s.str.zfill(5)) # 0000a, 000bb, 00ccc ```

Splitting Methods¶

Method	Description	Parameters
`str.split(pat)`	Split by delimiter	`pat`, `n=-1`, `expand=False`
`str.rsplit(pat)`	Split from right	`pat`, `n=-1`, `expand=False`
`str.partition(sep)`	Split at first occurrence	`sep`
`str.rpartition(sep)`	Split at last occurrence	`sep`

```python s = pd.Series(['a-b-c-d', 'x-y-z'])

Split all¶

print(s.str.split('-'))

[['a', 'b', 'c', 'd'], ['x', 'y', 'z']]¶

Split with limit¶

print(s.str.split('-', n=2))

[['a', 'b', 'c-d'], ['x', 'y', 'z']]¶

Expand into columns¶

print(s.str.split('-', expand=True))

0 1 2 3¶

0 a b c d¶

1 x y z None¶

```

Joining Methods¶

Method	Description	Parameters
`str.join(sep)`	Join list elements	`sep`
`str.cat()`	Concatenate strings	`others`, `sep`, `na_rep`

```python

Join lists¶

s = pd.Series([['a', 'b', 'c'], ['x', 'y']]) print(s.str.join('-')) # a-b-c, x-y

Concatenate all strings¶

s = pd.Series(['A', 'B', 'C']) print(s.str.cat(sep='-')) # A-B-C

Concatenate with another Series¶

s1 = pd.Series(['A', 'B', 'C']) s2 = pd.Series(['1', '2', '3']) print(s1.str.cat(s2, sep='-')) # A-1, B-2, C-3 ```

Stripping Methods¶

Method	Description	Parameters
`str.strip()`	Strip both sides	`to_strip=None`
`str.lstrip()`	Strip left side	`to_strip=None`
`str.rstrip()`	Strip right side	`to_strip=None`

```python s = pd.Series([' hello ', 'world'])

print(s.str.strip()) # 'hello', 'world' print(s.str.strip(' ')) # 'hello', 'world' print(s.str.lstrip(' ')) # 'hello ', 'world***' ```

Search Methods¶

Method	Description	Returns
`str.contains(pat)`	Contains pattern	bool Series
`str.startswith(pat)`	Starts with pattern	bool Series
`str.endswith(pat)`	Ends with pattern	bool Series
`str.match(pat)`	Match regex at start	bool Series
`str.fullmatch(pat)`	Full string matches regex	bool Series
`str.find(sub)`	Find substring position	int Series (-1 if not found)
`str.rfind(sub)`	Find from right	int Series
`str.index(sub)`	Find (raises if not found)	int Series
`str.rindex(sub)`	Find from right (raises)	int Series
`str.count(pat)`	Count occurrences	int Series

```python s = pd.Series(['apple', 'banana', 'cherry'])

print(s.str.contains('an')) # False, True, False print(s.str.startswith('a')) # True, False, False print(s.str.endswith('a')) # False, True, False print(s.str.find('a')) # 0, 1, -1 print(s.str.count('a')) # 1, 3, 0 ```

contains() Parameters¶

```python s = pd.Series(['Apple', 'BANANA', None, 'cherry'])

Case sensitivity¶

print(s.str.contains('a', case=True)) # False, False, NaN, True print(s.str.contains('a', case=False)) # True, True, NaN, True

Handle NA¶

print(s.str.contains('a', na=False)) # False, False, False, True print(s.str.contains('a', na=True)) # False, False, True, True

Regex¶

print(s.str.contains(r'^[A-Z]', regex=True)) # True, True, NaN, False ```

Replacement Methods¶

Method	Description	Parameters
`str.replace(pat, repl)`	Replace pattern	`pat`, `repl`, `n=-1`, `case=None`, `regex=True`
`str.translate(table)`	Translate via mapping	`table`
`str.slice_replace()`	Replace positional slice	`start`, `stop`, `repl`

```python s = pd.Series(['apple-pie', 'banana-split'])

Simple replace¶

print(s.str.replace('-', '_'))

apple_pie, banana_split¶

Regex replace¶

print(s.str.replace(r'-\w+', '', regex=True))

apple, banana¶

Replace with callable¶

print(s.str.replace(r'(\w+)-(\w+)', lambda m: m.group(2), regex=True))

pie, split¶

```

Extraction Methods¶

Method	Description	Returns
`str.extract(pat)`	Extract first match	DataFrame
`str.extractall(pat)`	Extract all matches	DataFrame (MultiIndex)
`str.findall(pat)`	Find all matches	Series of lists

```python s = pd.Series(['A-123', 'B-456', 'C-789'])

Extract with groups¶

print(s.str.extract(r'([A-Z])-(\d+)'))

0 1¶

0 A 123¶

1 B 456¶

2 C 789¶

Find all digits¶

s = pd.Series(['a1b2c3', 'x9']) print(s.str.findall(r'\d'))

[['1', '2', '3'], ['9']]¶

```

Slicing Methods¶

Method	Description	Parameters
`str[start:stop]`	Slice by position	start, stop, step
`str.slice(start, stop)`	Slice by position	`start`, `stop`, `step`
`str.get(i)`	Get character at position	`i`

```python s = pd.Series(['hello', 'world'])

print(s.str[0]) # h, w print(s.str[:3]) # hel, wor print(s.str[-2:]) # lo, ld print(s.str.get(0)) # h, w (NaN-safe) ```

Length and Size¶

Method	Description	Returns
`str.len()`	Length of string	int Series

python s = pd.Series(['hello', 'world', 'python']) print(s.str.len()) # 5, 5, 6

Encoding Methods¶

Method	Description	Parameters
`str.encode(encoding)`	Encode to bytes	`encoding`, `errors`
`str.decode(encoding)`	Decode from bytes	`encoding`, `errors`

python s = pd.Series(['hello', 'world']) encoded = s.str.encode('utf-8') print(encoded) # b'hello', b'world'

Checking Methods¶

Method	Description	Returns
`str.isalpha()`	All alphabetic	bool Series
`str.isalnum()`	All alphanumeric	bool Series
`str.isdigit()`	All digits	bool Series
`str.isnumeric()`	All numeric	bool Series
`str.isdecimal()`	All decimal	bool Series
`str.isspace()`	All whitespace	bool Series
`str.islower()`	All lowercase	bool Series
`str.isupper()`	All uppercase	bool Series
`str.istitle()`	Titlecase	bool Series

```python s = pd.Series(['hello', 'HELLO', 'Hello', '12345', 'hello123'])

print(s.str.isalpha()) # True, True, True, False, False print(s.str.isalnum()) # True, True, True, True, True print(s.str.isdigit()) # False, False, False, True, False print(s.str.islower()) # True, False, False, False, True print(s.str.isupper()) # False, True, False, False, False print(s.str.istitle()) # False, False, True, False, False ```

Wrapping and Normalization¶

Method	Description	Parameters
`str.wrap(width)`	Wrap text	`width`
`str.normalize(form)`	Unicode normalization	`form` (NFC, NFD, NFKC, NFKD)

python s = pd.Series(['This is a very long string that needs to be wrapped']) print(s.str.wrap(20))

Regular Expression Flags¶

For methods that support regex, you can use flags:

```python import re

s = pd.Series(['Hello World', 'HELLO world'])

Case insensitive¶

print(s.str.contains('hello', flags=re.IGNORECASE)) # True, True

Multiline, dotall, etc.¶

s = pd.Series(['line1\nline2']) print(s.str.contains('^line2', flags=re.MULTILINE)) # True ```

Handling Missing Data¶

All str methods handle NaN gracefully:

```python s = pd.Series(['hello', None, 'world'])

print(s.str.upper())

HELLO, NaN, WORLD¶

print(s.str.len())

5, NaN, 5¶

```

Method Chaining Example¶

```python

Complex text processing pipeline¶

s = pd.Series([' JOHN DOE ', ' jane SMITH ', ' BOB wilson '])

result = (s .str.strip() # Remove whitespace .str.title() # Titlecase .str.replace(' ', '_') # Replace spaces ) print(result)

John_Doe, Jane_Smith, Bob_Wilson¶

```

Performance Notes¶

Vectorized operations are faster than apply() with lambda
Avoid chaining too many operations; intermediate Series are created
Use regex=False when not needed for better performance
Consider str.contains(..., regex=False) for literal string search

Exercises¶

Exercise 1. Given a Series of product descriptions, use .str.extract() with a regex pattern to pull out all numeric values (prices) embedded in strings like 'Widget costs 29.99 dollars' and 'Gadget is 14.50 on sale'.

Solution to Exercise 1

Use .str.extract() with a pattern for decimal numbers.

import pandas as pd

descriptions = pd.Series([
    'Widget costs 29.99 dollars',
    'Gadget is 14.50 on sale',
    'Tool priced at 5.00'
])
prices = descriptions.str.extract(r'(\d+\.\d+)')[0].astype(float)
print(prices)

Exercise 2. Given a Series of full addresses like '123 Main St, New York, NY 10001', use .str.split(',') to break each address into parts. Then use .str.get() or bracket indexing to extract only the city (second element) and strip whitespace from it.

Solution to Exercise 2

Split on comma and extract the second element.

import pandas as pd

addresses = pd.Series([
    '123 Main St, New York, NY 10001',
    '456 Oak Ave, Los Angeles, CA 90001',
    '789 Pine Rd, Chicago, IL 60601'
])
cities = addresses.str.split(',').str[1].str.strip()
print(cities)

Exercise 3. Given a Series of filenames like ['report_2024.pdf', 'data_2023.csv', 'notes_2024.txt'], use .str.endswith() to filter only .csv files, and use .str.replace() with regex to extract the year from each filename.

Solution to Exercise 3

Combine .str.endswith() for filtering and .str.extract() for the year.

import pandas as pd

filenames = pd.Series(['report_2024.pdf', 'data_2023.csv', 'notes_2024.txt'])
csv_files = filenames[filenames.str.endswith('.csv')]
print("CSV files:", csv_files.tolist())

years = filenames.str.extract(r'(\d{4})')[0]
print("Years:", years.tolist())