Skip to content

LeetCode DataFrame Problems

This document covers common LeetCode patterns involving pandas DataFrame operations including groupby, merge, pivot, and multi-table manipulations.

Mental Model

LeetCode DataFrame problems are SQL queries in disguise. Each problem maps to a chain of pandas verbs: groupby for GROUP BY, merge for JOIN, query for WHERE. Recognizing the SQL pattern first, then translating to pandas, is the fastest path to a solution.

Pattern 1: GroupBy and Aggregation

LeetCode 586: Customer Placing the Largest Number of Orders

Problem: Find the customer who placed the most orders.

```python def largest_orders(orders: pd.DataFrame) -> pd.DataFrame: # Group by customer and count orders order_counts = orders.groupby('customer_number')['order_number'].count()

# Find customer with max orders
max_customer = order_counts.idxmax()

return pd.DataFrame({'customer_number': [max_customer]})

```

With reset_index:

```python def largest_orders(orders: pd.DataFrame) -> pd.DataFrame: counts = orders.groupby('customer_number')['order_number'].count().reset_index() counts.columns = ['customer_number', 'order_count']

max_count = counts['order_count'].max()
return counts[counts['order_count'] == max_count][['customer_number']]

```

LeetCode 182: Duplicate Emails

Problem: Find emails that appear more than once.

```python def duplicate_emails(person: pd.DataFrame) -> pd.DataFrame: # Group by email and count email_counts = person.groupby('email')['id'].count().reset_index(name='count')

# Filter duplicates
duplicates = email_counts[email_counts['count'] > 1]

return duplicates[['email']].rename(columns={'email': 'Email'})

```

Pattern 2: Merge Operations

LeetCode 175: Combine Two Tables

Problem: Left join person and address tables.

python def combine_two_tables(person: pd.DataFrame, address: pd.DataFrame) -> pd.DataFrame: result = person.merge( address, on='personId', how='left' ) return result[['firstName', 'lastName', 'city', 'state']]

LeetCode 577: Employee Bonus

Problem: Find employees with bonus < 1000 or no bonus.

```python def employee_bonus(employee: pd.DataFrame, bonus: pd.DataFrame) -> pd.DataFrame: # Left merge to include employees without bonus merged = employee.merge(bonus, on='empId', how='left')

# Filter: bonus < 1000 OR bonus is null
result = merged[(merged['bonus'] < 1000) | (merged['bonus'].isnull())]

return result[['name', 'bonus']]

```

Key Concepts:

  • Left join preserves all employees
  • isnull() catches employees without bonus records

LeetCode 181: Employees Earning More Than Their Managers

Problem: Self-join to compare employee and manager salaries.

```python def employees_earning_more(employee: pd.DataFrame) -> pd.DataFrame: # Self merge: join employee with their manager merged = employee.merge( employee, left_on='managerId', right_on='id', suffixes=('_emp', '_mgr') )

# Filter where employee earns more
result = merged[merged['salary_emp'] > merged['salary_mgr']]

return result[['name_emp']].rename(columns={'name_emp': 'Employee'})

```

Pattern 3: Pivot and Reshape

LeetCode 1179: Reformat Department Table

Problem: Pivot monthly revenue by department.

```python def reformat_table(department: pd.DataFrame) -> pd.DataFrame: # Pivot: rows=id, columns=month, values=revenue pivoted = department.pivot( index='id', columns='month', values='revenue' ).reset_index()

# Ensure all months present
all_months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
              'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

# Add revenue suffix to column names
pivoted.columns = ['id'] + [f'{m}_Revenue' for m in all_months 
                            if m in pivoted.columns]

# Reindex to include missing months
for month in all_months:
    col = f'{month}_Revenue'
    if col not in pivoted.columns:
        pivoted[col] = None

return pivoted

```

LeetCode 626: Exchange Seats

Problem: Swap adjacent seat IDs.

```python def exchange_seats(seat: pd.DataFrame) -> pd.DataFrame: n = len(seat)

# Create new id column
seat['new_id'] = seat['id'].apply(
    lambda x: x + 1 if x % 2 == 1 and x < n 
              else x - 1 if x % 2 == 0 
              else x
)

# Sort by new_id and reset
result = seat.sort_values('new_id').reset_index(drop=True)
result['id'] = result['new_id']

return result[['id', 'student']]

```

Pattern 4: Ranking and Window Functions

LeetCode 176: Second Highest Salary

Problem: Find the second highest distinct salary.

```python def second_highest_salary(employee: pd.DataFrame) -> pd.DataFrame: # Get unique salaries sorted descending unique_salaries = employee['salary'].drop_duplicates().sort_values(ascending=False)

# Get second highest or None
if len(unique_salaries) >= 2:
    second = unique_salaries.iloc[1]
else:
    second = None

return pd.DataFrame({'SecondHighestSalary': [second]})

```

LeetCode 178: Rank Scores

Problem: Rank scores with dense ranking.

```python def rank_scores(scores: pd.DataFrame) -> pd.DataFrame: # Dense rank: no gaps in ranking scores['rank'] = scores['score'].rank(method='dense', ascending=False)

return scores[['score', 'rank']].sort_values('score', ascending=False)

```

LeetCode 184: Department Highest Salary

Problem: Find highest paid employee(s) per department.

```python def department_highest_salary( employee: pd.DataFrame, department: pd.DataFrame ) -> pd.DataFrame: # Merge to get department names merged = employee.merge(department, left_on='departmentId', right_on='id')

# Find max salary per department
max_salaries = merged.groupby('departmentId')['salary'].transform('max')

# Filter employees with max salary
result = merged[merged['salary'] == max_salaries]

return result[['name_y', 'name_x', 'salary']].rename(
    columns={'name_y': 'Department', 'name_x': 'Employee', 'salary': 'Salary'}
)

```

Pattern 5: Date Operations

LeetCode 197: Rising Temperature

Problem: Find dates where temperature was higher than previous day.

```python def rising_temperature(weather: pd.DataFrame) -> pd.DataFrame: # Sort by date weather = weather.sort_values('recordDate')

# Self merge on consecutive dates
weather['prev_date'] = weather['recordDate'] - pd.Timedelta(days=1)

merged = weather.merge(
    weather,
    left_on='prev_date',
    right_on='recordDate',
    suffixes=('', '_prev')
)

# Filter where temperature increased
rising = merged[merged['temperature'] > merged['temperature_prev']]

return rising[['id']]

```

Alternative with shift:

```python def rising_temperature(weather: pd.DataFrame) -> pd.DataFrame: weather = weather.sort_values('recordDate')

# Check for consecutive dates
weather['prev_temp'] = weather['temperature'].shift(1)
weather['prev_date'] = weather['recordDate'].shift(1)
weather['is_consecutive'] = (
    weather['recordDate'] - weather['prev_date']
) == pd.Timedelta(days=1)

# Filter rising and consecutive
result = weather[
    (weather['temperature'] > weather['prev_temp']) & 
    weather['is_consecutive']
]

return result[['id']]

```

Pattern 6: Multiple Aggregations

LeetCode 1193: Monthly Transactions I

Problem: Aggregate transaction counts and amounts by month and country.

```python def monthly_transactions(transactions: pd.DataFrame) -> pd.DataFrame: # Extract month transactions['month'] = transactions['trans_date'].dt.strftime('%Y-%m')

# Aggregate
result = transactions.groupby(['month', 'country']).agg(
    trans_count=('id', 'count'),
    approved_count=('state', lambda x: (x == 'approved').sum()),
    trans_total_amount=('amount', 'sum'),
    approved_total_amount=('amount', lambda x: x[transactions.loc[x.index, 'state'] == 'approved'].sum())
).reset_index()

return result

```

Alternative approach:

```python def monthly_transactions(transactions: pd.DataFrame) -> pd.DataFrame: transactions['month'] = transactions['trans_date'].dt.strftime('%Y-%m') transactions['is_approved'] = transactions['state'] == 'approved' transactions['approved_amount'] = transactions['amount'] * transactions['is_approved']

result = transactions.groupby(['month', 'country']).agg({
    'id': 'count',
    'is_approved': 'sum',
    'amount': 'sum',
    'approved_amount': 'sum'
}).reset_index()

result.columns = ['month', 'country', 'trans_count', 'approved_count',
                  'trans_total_amount', 'approved_total_amount']

return result

```

Pattern 7: Conditional Updates with loc

LeetCode 1873: Calculate Special Bonus

Problem: Assign bonus based on conditions.

```python def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame: # Initialize bonus to 0 employees['bonus'] = 0

# Condition: odd employee_id AND name doesn't start with 'M'
mask = (
    (employees['employee_id'] % 2 == 1) & 
    (~employees['name'].str.startswith('M'))
)

# Assign salary as bonus where condition is met
employees.loc[mask, 'bonus'] = employees.loc[mask, 'salary']

return employees[['employee_id', 'bonus']].sort_values('employee_id')

```

Pattern 8: Outer Joins for Finding Missing Data

LeetCode 1148: Article Views I

Problem: Find authors who viewed their own articles.

```python def article_views(views: pd.DataFrame) -> pd.DataFrame: # Filter where author == viewer self_views = views[views['author_id'] == views['viewer_id']]

# Get unique authors
result = self_views[['author_id']].drop_duplicates()
result.columns = ['id']

return result.sort_values('id')

```

Pattern Summary

Pattern Methods Used Common Use Case
GroupBy Aggregation groupby(), count(), sum() Counting, totaling
Merge/Join merge(), how='left/inner/outer' Combining tables
Self Merge merge() with same DataFrame Comparing rows
Pivot pivot(), pivot_table() Reshaping data
Ranking rank(), nlargest() Finding top N
Date Operations dt accessor, shift(), Timedelta Time comparisons
Conditional Update loc[] with mask Row-wise assignment
Multiple Aggregations agg() with dict Complex summaries

Common DataFrame Operations Quick Reference

```python

GroupBy with multiple aggregations

df.groupby('col').agg({ 'value': ['sum', 'mean', 'count'], 'other': 'max' })

Merge with different keys

df1.merge(df2, left_on='id1', right_on='id2', how='left')

Self merge

df.merge(df, left_on='parent_id', right_on='id', suffixes=('', '_parent'))

Conditional column

df['new_col'] = np.where(condition, value_if_true, value_if_false)

Transform (broadcast aggregation back)

df['group_mean'] = df.groupby('group')['value'].transform('mean')

Rank within groups

df['rank'] = df.groupby('group')['value'].rank(method='dense', ascending=False) ```


Exercises

Exercise 1. Create a DataFrame with columns 'name', 'salary', and 'department'. Write code to find the highest salary in each department.

Solution to Exercise 1

```python import pandas as pd

df = pd.DataFrame({ 'name': ['Alice', 'Bob', 'Charlie', 'David'], 'salary': [70000, 80000, 60000, 90000], 'department': ['IT', 'IT', 'HR', 'HR'] }) result = df.groupby('department')['salary'].max() print(result) ```


Exercise 2. Write code to find all duplicate rows in a DataFrame based on a specific column using duplicated().

Solution to Exercise 2

See the main content for the relevant patterns and API calls. The solution involves understanding how to combine Pandas operations to solve data manipulation problems.


Exercise 3. Create a DataFrame and write code to rank employees by salary within each department using groupby() and rank().

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({ 'value': np.random.randint(0, 100, 20), 'group': np.random.choice(['A', 'B'], 20) }) result = df.groupby('group')['value'].transform('sum') print(result) ```


Exercise 4. Write code that deletes rows where a specific column has null values, then resets the index.

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) s = pd.Series(np.random.randn(100)) s_clean = s.clip(lower=0) print(s_clean.describe()) ```