rank Method¶
The rank() method assigns ranks to values within each group, useful for identifying top performers or creating rankings.
Mental Model
rank() inside a groupby is SQL's RANK() OVER (PARTITION BY ...). Each group gets its own independent ranking -- the top value in group A is rank 1 regardless of values in group B. The result has the same shape as the original DataFrame, with rank numbers replacing the original values.
Basic Ranking¶
Rank values within groups.
1. Simple Rank¶
```python import pandas as pd
df = pd.DataFrame({ 'dept': ['A', 'A', 'A', 'B', 'B', 'B'], 'employee': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Frank'], 'salary': [5000, 6000, 4000, 4500, 5500, 4500] })
df['rank'] = df.groupby('dept')['salary'].rank(ascending=False) print(df) ```
dept employee salary rank
0 A Alice 5000 2.0
1 A Bob 6000 1.0
2 A Carol 4000 3.0
3 B Dave 4500 2.5
4 B Eve 5500 1.0
5 B Frank 4500 2.5
2. ascending Parameter¶
```python
ascending=False: highest value gets rank 1¶
ascending=True: lowest value gets rank 1¶
```
3. Tie Handling¶
Default: tied values get average rank (2.5 for Dave and Frank).
LeetCode Example: Department Top Salaries¶
Find top 3 salaries per department.
1. Sample Data¶
python
employee = pd.DataFrame({
'departmentId': [1, 1, 1, 2, 2, 2],
'name': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Frank'],
'salary': [5000, 6000, 4000, 4500, 5500, 4500]
})
2. Dense Ranking¶
python
employee['rank'] = employee.groupby('departmentId')['salary'].rank(
ascending=False,
method='dense'
)
print(employee)
3. Filter Top 3¶
python
top_3 = employee[employee['rank'] <= 3]
method Parameter¶
Control how ties are handled.
1. method='average' (Default)¶
```python
Ties get average of ranks they would occupy¶
[100, 100, 80] → [1.5, 1.5, 3.0]¶
```
2. method='min'¶
```python
Ties get lowest rank¶
[100, 100, 80] → [1, 1, 3]¶
```
3. method='dense'¶
```python
Ties get same rank, next rank is consecutive¶
[100, 100, 80] → [1, 1, 2]¶
```
Ranking Examples¶
Common ranking patterns.
1. Percentile Rank¶
python
df['percentile'] = df.groupby('dept')['salary'].rank(pct=True)
2. Row Number¶
python
df['row_num'] = df.groupby('dept').cumcount() + 1
3. First/Last in Group¶
python
df['is_top'] = df.groupby('dept')['salary'].rank(ascending=False) == 1
Comparison with SQL¶
Equivalent SQL window functions.
1. RANK()¶
python
df.groupby('dept')['salary'].rank(method='min', ascending=False)
2. DENSE_RANK()¶
python
df.groupby('dept')['salary'].rank(method='dense', ascending=False)
3. ROW_NUMBER()¶
python
df.groupby('dept').cumcount() + 1
Exercises¶
Exercise 1.
Create a DataFrame with 'department' and 'salary' columns. Use groupby('department')['salary'].rank(method='dense', ascending=False) to assign dense rankings within each department. Show the top earner per department.
Solution to Exercise 1
Dense ranking within groups to find top earners.
import pandas as pd
df = pd.DataFrame({
'department': ['IT', 'IT', 'IT', 'HR', 'HR'],
'name': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve'],
'salary': [70000, 65000, 72000, 50000, 55000]
})
df['rank'] = df.groupby('department')['salary'].rank(
method='dense', ascending=False
)
top = df[df['rank'] == 1]
print(top)
Exercise 2.
Compare three ranking methods ('average', 'min', 'dense') on the same data that contains ties. Create a DataFrame showing the original values alongside all three ranking columns.
Solution to Exercise 2
Compare average, min, and dense ranking on tied data.
import pandas as pd
df = pd.DataFrame({
'value': [100, 200, 200, 300, 300, 300]
})
df['rank_avg'] = df['value'].rank(method='average')
df['rank_min'] = df['value'].rank(method='min')
df['rank_dense'] = df['value'].rank(method='dense')
print(df)
Exercise 3.
Use .rank(pct=True) to compute percentile ranks within groups. Given student scores grouped by class, compute the percentile rank and identify students in the top 25% of their class.
Solution to Exercise 3
Use percentile ranking to find top performers.
import pandas as pd
df = pd.DataFrame({
'class': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'student': ['s1', 's2', 's3', 's4', 's5', 's6', 's7', 's8'],
'score': [85, 92, 78, 95, 88, 76, 91, 84]
})
df['pct_rank'] = df.groupby('class')['score'].rank(pct=True)
top_25 = df[df['pct_rank'] >= 0.75]
print("Top 25% per class:")
print(top_25)