Keyword - on¶

The on parameter specifies which column(s) to use as the join key when both DataFrames share the same column name.

Single Column¶

Join on a single shared column.

1. Basic Usage¶

import pandas as pd

df1 = pd.DataFrame({
    'city': ['NY', 'SF', 'LA'],
    'temperature': [21, 14, 35]
})

df2 = pd.DataFrame({
    'city': ['SF', 'NY', 'ICN'],
    'humidity': [65, 68, 75]
})

df = pd.merge(df1, df2, on='city')
print(df)

  city  temperature  humidity
0   NY           21        68
1   SF           14        65

2. Column Must Exist¶

# Column must exist in both DataFrames
# pd.merge(df1, df2, on='nonexistent')  # KeyError

3. Case Sensitive¶

# Column names are case-sensitive
# 'City' != 'city'

Multiple Columns¶

Join on multiple columns.

1. List of Columns¶

df1 = pd.DataFrame({
    'year': [2023, 2023, 2024],
    'month': [1, 2, 1],
    'sales': [100, 150, 200]
})

df2 = pd.DataFrame({
    'year': [2023, 2023, 2024],
    'month': [1, 2, 1],
    'expenses': [80, 90, 120]
})

df = pd.merge(df1, df2, on=['year', 'month'])
print(df)

2. All Keys Must Match¶

# A row matches only if ALL specified columns match

3. Order Independence¶

# on=['year', 'month'] same as on=['month', 'year']

LeetCode Example: Student Examinations¶

Join on multiple columns for student-subject pairs.

1. Sample Data¶

student_subject = pd.DataFrame({
    'student_id': [1, 1, 2, 2],
    'student_name': ['Alice', 'Alice', 'Bob', 'Bob'],
    'subject_name': ['Math', 'Science', 'Math', 'Science']
})

examination_count = pd.DataFrame({
    'student_id': [1, 1, 2],
    'subject_name': ['Math', 'Science', 'Math'],
    'attended_exams': [2, 1, 3]
})

2. Multi-column Join¶

result = pd.merge(
    student_subject,
    examination_count,
    on=['student_id', 'subject_name'],
    how='left'
)
print(result)

3. Result¶

   student_id student_name subject_name  attended_exams
0           1        Alice         Math             2.0
1           1        Alice      Science             1.0
2           2          Bob         Math             3.0
3           2          Bob      Science             NaN

Default Behavior¶

When on=None, merge auto-detects common columns.

1. Auto Detection¶

# Finds all columns with same name in both DataFrames
pd.merge(df1, df2)  # Uses all common columns

2. Explicit is Better¶

# Prefer explicit on parameter for clarity
pd.merge(df1, df2, on='common_column')

3. Avoid Surprises¶

# Auto-detection may join on unintended columns
# Always specify on for production code

Error Handling¶

Common issues with the on parameter.

1. Column Not Found¶

# KeyError if column doesn't exist in both
try:
    pd.merge(df1, df2, on='missing_col')
except KeyError as e:
    print(f"Error: {e}")

2. No Common Columns¶

# MergeError if on=None and no common columns
# Solution: use left_on and right_on

3. Type Mismatch¶

# Columns should have compatible types
# int and float usually work
# str and int will not match