Skip to content

apply Method

The apply() method applies a function along an axis of a DataFrame or to elements of a Series. It is one of the most versatile pandas methods.

Mental Model

apply() is the escape hatch: when no built-in vectorized method exists, you hand pandas a custom function and it runs it on every element (Series) or every row/column (DataFrame). It is more flexible than map but slower than vectorized operations -- reach for it only when no built-in method fits.

Series apply

Apply a function to each element of a Series.

1. Lambda Function

```python import pandas as pd

url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv" df = pd.read_csv(url, index_col='PassengerId')

bool_mask = df.Sex.apply(lambda x: x == "female") print(bool_mask.head()) ```

PassengerId 1 False 2 True 3 True 4 True 5 False Name: Sex, dtype: bool

2. Named Function

```python def classify_age(age): if pd.isna(age): return 'Unknown' elif age < 18: return 'Child' elif age < 65: return 'Adult' else: return 'Senior'

df['AgeGroup'] = df['Age'].apply(classify_age) ```

3. String Methods Alternative

```python

Instead of apply for simple string operations:

df['Name'].apply(lambda x: x.upper())

Use vectorized string methods:

df['Name'].str.upper() ```

DataFrame apply

Apply a function along rows or columns.

1. Column-wise (axis=0)

python df[['Age', 'Fare']].apply(lambda x: x.mean())

Age 29.699118 Fare 32.204208 dtype: float64

2. Row-wise (axis=1)

python bool_mask = df.apply(lambda x: x.Sex == "female", axis=1) print(bool_mask.head())

3. Multiple Columns

python df['Total'] = df.apply( lambda row: row['Quantity'] * row['Price'], axis=1 )

LeetCode Example: Class Attendance

Count students per class using apply.

1. Sample Data

python courses = pd.DataFrame({ 'class': ['Math', 'Science', 'Math', 'History', 'Math', 'Science', 'Math', 'History'], 'student': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Frank', 'Grace', 'Helen'] })

2. GroupBy with apply

python result = courses.groupby('class')['student'].apply(len) print(result)

class History 2 Math 4 Science 2 Name: student, dtype: int64

3. Alternative with size

python courses.groupby('class').size()

LeetCode Example: Triangle Judgement

Convert boolean to Yes/No string.

1. Apply with Lambda

```python triangle = pd.DataFrame({ 'x': [3, 1, 5], 'y': [4, 2, 10], 'z': [5, 3, 7], 'is_valid': [True, False, True] })

triangle["result"] = triangle["is_valid"].apply( lambda x: "Yes" if x else "No" ) print(triangle) ```

2. Result

x y z is_valid result 0 3 4 5 True Yes 1 1 2 3 False No 2 5 10 7 True Yes

3. Alternative with map

python triangle["result"] = triangle["is_valid"].map({True: "Yes", False: "No"})

LeetCode Example: Special Bonus

Apply with multiple conditions.

1. Bonus Criteria Function

python def bonus_criteria(employee_id, name): return employee_id % 2 != 0 and not name.startswith('M')

2. Apply Row-wise

```python employees = pd.DataFrame({ 'employee_id': [1, 2, 3, 4, 5], 'name': ['Alice', 'Bob', 'Mike', 'Molly', 'Eve'], 'salary': [50000, 60000, 70000, 80000, 90000] })

employees['bonus'] = employees.apply( lambda row: row['salary'] if bonus_criteria(row['employee_id'], row['name']) else 0, axis=1 ) print(employees) ```

3. Result

employee_id name salary bonus 0 1 Alice 50000 50000 1 2 Bob 60000 0 2 3 Mike 70000 0 3 4 Molly 80000 0 4 5 Eve 90000 90000

Performance Considerations

When to use and avoid apply.

1. Prefer Vectorized Operations

```python

Slow

df['double'] = df['value'].apply(lambda x: x * 2)

Fast

df['double'] = df['value'] * 2 ```

2. Avoid Row-wise When Possible

```python

Slow (iterates rows)

df.apply(lambda row: row['a'] + row['b'], axis=1)

Fast (vectorized)

df['a'] + df['b'] ```

3. Use apply When Necessary

  • Complex logic that cannot be vectorized
  • Custom aggregation functions
  • Operations requiring multiple columns with conditions

Exercises

Exercise 1. Create a Series of ages. Use .apply() with a function that classifies each age as 'Child' (under 18), 'Adult' (18-64), or 'Senior' (65+). Count the occurrences of each category.

Solution to Exercise 1

Classify ages using apply with a named function.

import pandas as pd

ages = pd.Series([5, 17, 25, 45, 70, 12, 68])

def classify(age):
    if age < 18:
        return 'Child'
    elif age < 65:
        return 'Adult'
    else:
        return 'Senior'

categories = ages.apply(classify)
print(categories.value_counts())

Exercise 2. Create a DataFrame with 'first_name' and 'last_name' columns. Use .apply() on each row (axis=1) to create a 'full_name' column that combines both names with a space.

Solution to Exercise 2

Combine columns row-wise using apply with axis=1.

import pandas as pd

df = pd.DataFrame({
    'first_name': ['Alice', 'Bob', 'Carol'],
    'last_name': ['Smith', 'Jones', 'Lee']
})
df['full_name'] = df.apply(lambda row: row['first_name'] + ' ' + row['last_name'], axis=1)
print(df)

Exercise 3. Create a numeric DataFrame. Apply a lambda function column-wise (axis=0) that returns the range (max - min) of each column. Compare the result with computing it manually using .max() - .min().

Solution to Exercise 3

Apply a function column-wise and verify the result.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'])
ranges_apply = df.apply(lambda col: col.max() - col.min(), axis=0)
ranges_manual = df.max() - df.min()
print(ranges_apply)
assert (ranges_apply == ranges_manual).all()
print("Results match.")