rank Method¶

The rank() method assigns ranks to values within each group, useful for identifying top performers or creating rankings.

Mental Model

rank() inside a groupby is SQL's RANK() OVER (PARTITION BY ...). Each group gets its own independent ranking -- the top value in group A is rank 1 regardless of values in group B. The result has the same shape as the original DataFrame, with rank numbers replacing the original values.

Basic Ranking¶

Rank values within groups.

1. Simple Rank¶

```python import pandas as pd

df = pd.DataFrame({ 'dept': ['A', 'A', 'A', 'B', 'B', 'B'], 'employee': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Frank'], 'salary': [5000, 6000, 4000, 4500, 5500, 4500] })

df['rank'] = df.groupby('dept')['salary'].rank(ascending=False) print(df) ```

dept employee salary rank 0 A Alice 5000 2.0 1 A Bob 6000 1.0 2 A Carol 4000 3.0 3 B Dave 4500 2.5 4 B Eve 5500 1.0 5 B Frank 4500 2.5

2. ascending Parameter¶

```python

ascending=False: highest value gets rank 1¶

ascending=True: lowest value gets rank 1¶

```

3. Tie Handling¶

Default: tied values get average rank (2.5 for Dave and Frank).

LeetCode Example: Department Top Salaries¶

Find top 3 salaries per department.

1. Sample Data¶

python employee = pd.DataFrame({ 'departmentId': [1, 1, 1, 2, 2, 2], 'name': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Frank'], 'salary': [5000, 6000, 4000, 4500, 5500, 4500] })

2. Dense Ranking¶

python employee['rank'] = employee.groupby('departmentId')['salary'].rank( ascending=False, method='dense' ) print(employee)

3. Filter Top 3¶

python top_3 = employee[employee['rank'] <= 3]

method Parameter¶

Control how ties are handled.

1. method='average' (Default)¶

```python

Ties get average of ranks they would occupy¶

[100, 100, 80] → [1.5, 1.5, 3.0]¶

```

2. method='min'¶

```python

Ties get lowest rank¶

[100, 100, 80] → [1, 1, 3]¶

```

3. method='dense'¶

```python

Ties get same rank, next rank is consecutive¶

[100, 100, 80] → [1, 1, 2]¶

```

Ranking Examples¶

Common ranking patterns.

1. Percentile Rank¶

python df['percentile'] = df.groupby('dept')['salary'].rank(pct=True)

2. Row Number¶

python df['row_num'] = df.groupby('dept').cumcount() + 1

3. First/Last in Group¶

python df['is_top'] = df.groupby('dept')['salary'].rank(ascending=False) == 1

Comparison with SQL¶

Equivalent SQL window functions.

1. RANK()¶

python df.groupby('dept')['salary'].rank(method='min', ascending=False)

2. DENSE_RANK()¶

python df.groupby('dept')['salary'].rank(method='dense', ascending=False)

3. ROW_NUMBER()¶

python df.groupby('dept').cumcount() + 1

Exercises¶

Exercise 1. Create a DataFrame with 'department' and 'salary' columns. Use groupby('department')['salary'].rank(method='dense', ascending=False) to assign dense rankings within each department. Show the top earner per department.

Solution to Exercise 1

Dense ranking within groups to find top earners.

import pandas as pd

df = pd.DataFrame({
    'department': ['IT', 'IT', 'IT', 'HR', 'HR'],
    'name': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve'],
    'salary': [70000, 65000, 72000, 50000, 55000]
})
df['rank'] = df.groupby('department')['salary'].rank(
    method='dense', ascending=False
)
top = df[df['rank'] == 1]
print(top)

Exercise 2. Compare three ranking methods ('average', 'min', 'dense') on the same data that contains ties. Create a DataFrame showing the original values alongside all three ranking columns.

Solution to Exercise 2

Compare average, min, and dense ranking on tied data.

import pandas as pd

df = pd.DataFrame({
    'value': [100, 200, 200, 300, 300, 300]
})
df['rank_avg'] = df['value'].rank(method='average')
df['rank_min'] = df['value'].rank(method='min')
df['rank_dense'] = df['value'].rank(method='dense')
print(df)

Exercise 3. Use .rank(pct=True) to compute percentile ranks within groups. Given student scores grouped by class, compute the percentile rank and identify students in the top 25% of their class.

Solution to Exercise 3

Use percentile ranking to find top performers.

import pandas as pd

df = pd.DataFrame({
    'class': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
    'student': ['s1', 's2', 's3', 's4', 's5', 's6', 's7', 's8'],
    'score': [85, 92, 78, 95, 88, 76, 91, 84]
})
df['pct_rank'] = df.groupby('class')['score'].rank(pct=True)
top_25 = df[df['pct_rank'] >= 0.75]
print("Top 25% per class:")
print(top_25)