Skip to content

apply with axis

The axis parameter in apply() determines whether the function is applied along rows or columns. Understanding axis behavior is crucial for correct DataFrame operations.

Mental Model

axis=0 feeds each column as a Series to your function (the function "moves down" the rows). axis=1 feeds each row as a Series (the function "moves across" the columns). The axis number tells you which dimension is being collapsed or iterated over.

axis=0 Column-wise

Apply function to each column (default behavior).

1. Default Behavior

```python import pandas as pd import numpy as np

df = pd.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9] })

result = df.apply(np.sum, axis=0) print(result) ```

A 6 B 15 C 24 dtype: int64

2. Custom Function

```python def column_stats(col): return pd.Series({ 'mean': col.mean(), 'std': col.std(), 'min': col.min(), 'max': col.max() })

df.apply(column_stats, axis=0) ```

3. Column Normalization

python df.apply(lambda col: (col - col.mean()) / col.std(), axis=0)

axis=1 Row-wise

Apply function to each row.

1. Row Sum

python result = df.apply(np.sum, axis=1) print(result)

0 12 1 15 2 18 dtype: int64

2. Row Maximum

python df.apply(lambda row: row.max(), axis=1)

3. Conditional Row Logic

python df.apply( lambda row: 'High' if row['A'] > 2 else 'Low', axis=1 )

Multiple Column Operations

Common patterns using axis=1.

1. Weighted Calculation

```python df = pd.DataFrame({ 'quantity': [10, 20, 30], 'price': [100, 200, 150], 'discount': [0.1, 0.2, 0.15] })

df['total'] = df.apply( lambda row: row['quantity'] * row['price'] * (1 - row['discount']), axis=1 ) ```

2. String Concatenation

```python df = pd.DataFrame({ 'first_name': ['John', 'Jane'], 'last_name': ['Doe', 'Smith'] })

df['full_name'] = df.apply( lambda row: f"{row['first_name']} {row['last_name']}", axis=1 ) ```

3. Conditional Assignment

python df['status'] = df.apply( lambda row: 'Pass' if row['score'] >= 60 else 'Fail', axis=1 )

result_type Parameter

Control the output format when applying row-wise.

1. result_type='expand'

```python def extract_parts(row): return [row['A'], row['B'], row['A'] + row['B']]

df.apply(extract_parts, axis=1, result_type='expand') ```

Returns a DataFrame with columns 0, 1, 2.

2. result_type='reduce'

```python

Return a Series (default)

df.apply(lambda row: row.sum(), axis=1, result_type='reduce') ```

3. result_type='broadcast'

```python

Same shape as original DataFrame

df.apply(lambda row: row - row.mean(), axis=1, result_type='broadcast') ```

LeetCode Example: Quality Metrics

Calculate metrics with groupby and apply.

1. Sample Data

```python queries = pd.DataFrame({ 'query_name': ['Query1', 'Query1', 'Query2', 'Query2'], 'rating': [5, 4, 3, 2], 'position': [2, 1, 3, 2] })

queries['quality'] = queries['rating'] / queries['position'] queries['poor_query'] = (queries['rating'] < 3).astype(int) * 100 ```

2. Round Function

```python round2 = lambda x: round(x, 2)

result = (queries .groupby('query_name')[['quality', 'poor_query']] .mean() .apply(round2) .reset_index()) print(result) ```

3. Result

query_name quality poor_query 0 Query1 3.25 0.00 1 Query2 1.00 50.00

Performance Comparison

Row-wise apply vs vectorized operations.

1. Slow Row-wise

python %%timeit df.apply(lambda row: row['A'] + row['B'], axis=1)

2. Fast Vectorized

python %%timeit df['A'] + df['B']

3. When to Use axis=1

  • Complex conditional logic
  • Operations requiring multiple columns
  • Non-vectorizable custom functions

Exercises

Exercise 1. Create a numeric DataFrame with 4 columns. Use apply(np.mean, axis=0) to compute the mean of each column, then apply(np.mean, axis=1) to compute the mean of each row. Verify the column means match df.mean().

Solution to Exercise 1

Apply mean along both axes.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5, 4), columns=['A', 'B', 'C', 'D'])
col_means = df.apply(np.mean, axis=0)
row_means = df.apply(np.mean, axis=1)
print("Column means:\n", col_means)
print("Row means:\n", row_means)
assert (col_means == df.mean()).all()

Exercise 2. Create a DataFrame with columns 'math', 'science', and 'english'. Use apply() with axis=1 to add a new column 'highest_subject' that contains the name of the column with the highest score for each row (use idxmax()).

Solution to Exercise 2

Find the column name of the max value per row.

import pandas as pd

df = pd.DataFrame({
    'math': [85, 92, 78],
    'science': [90, 88, 95],
    'english': [88, 85, 80]
})
df['highest_subject'] = df.apply(lambda row: row.idxmax(), axis=1)
print(df)

Exercise 3. Create a DataFrame and use apply() with axis=0 to return a Series with the count of values above the column mean for each column. Compare using a custom function vs a vectorized approach.

Solution to Exercise 3

Count values above the column mean using apply.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(100, 3), columns=['A', 'B', 'C'])
above_mean = df.apply(lambda col: (col > col.mean()).sum(), axis=0)
print(above_mean)