Keyword - on and lsuffix/rsuffix¶
By default, DataFrame.join matches rows using the index of both DataFrames. The on parameter overrides this by specifying a column from the calling DataFrame to use as the join key instead. When both DataFrames share non-key column names, the lsuffix and rsuffix parameters add distinguishing suffixes to avoid name collisions.
import pandas as pd
on Parameter¶
The on parameter specifies one or more columns from the calling DataFrame to join on. The other DataFrame is still matched by its index.
Default Behavior (No on)¶
Without on, both DataFrames are matched by their indices.
df_left = pd.DataFrame(
{"value": [10, 20, 30]},
index=["a", "b", "c"]
)
df_right = pd.DataFrame(
{"label": ["X", "Y"]},
index=["a", "b"]
)
result = df_left.join(df_right)
print(result)
value label
a 10 X
b 20 Y
c 30 NaN
Using on to Join on a Column¶
When the caller has a column that matches the other DataFrame's index, use on to specify it.
employees = pd.DataFrame({
"name": ["Alice", "Bob", "Carol"],
"dept_id": ["D1", "D2", "D1"]
})
departments = pd.DataFrame(
{"dept_name": ["Engineering", "Sales"]},
index=["D1", "D2"]
)
# Join on the "dept_id" column of employees, matched to departments' index
result = employees.join(departments, on="dept_id")
print(result)
name dept_id dept_name
0 Alice D1 Engineering
1 Bob D2 Sales
2 Carol D1 Engineering
The dept_id column values in employees are matched against the index of departments.
on with Multiple Columns¶
When the other DataFrame has a MultiIndex, pass a list of column names.
data = pd.DataFrame({
"year": [2023, 2023, 2024],
"quarter": ["Q1", "Q2", "Q1"],
"revenue": [100, 200, 150]
})
targets = pd.DataFrame(
{"target": [120, 180, 160]},
index=pd.MultiIndex.from_tuples(
[(2023, "Q1"), (2023, "Q2"), (2024, "Q1")],
names=["year", "quarter"]
)
)
result = data.join(targets, on=["year", "quarter"])
print(result)
year quarter revenue target
0 2023 Q1 100 120
1 2023 Q2 200 180
2 2024 Q1 150 160
lsuffix and rsuffix Parameters¶
When both DataFrames have columns with the same name (excluding the join key), join raises a ValueError unless you specify suffixes to distinguish them.
The Problem¶
df1 = pd.DataFrame({"val": [1, 2]}, index=["a", "b"])
df2 = pd.DataFrame({"val": [3, 4]}, index=["a", "b"])
try:
result = df1.join(df2)
except ValueError as e:
print(e)
# columns overlap but no suffix specified: Index(['val'], dtype='object')
The Fix¶
result = df1.join(df2, lsuffix="_left", rsuffix="_right")
print(result)
val_left val_right
a 1 3
b 2 4
lsuffixis appended to overlapping column names from the left (calling) DataFramersuffixis appended to overlapping column names from the right (other) DataFrame
Non-Overlapping Columns Are Unchanged¶
Suffixes are applied only to columns that conflict. Columns unique to one DataFrame keep their original names.
df1 = pd.DataFrame({"val": [1, 2], "extra": [5, 6]}, index=["a", "b"])
df2 = pd.DataFrame({"val": [3, 4], "info": [7, 8]}, index=["a", "b"])
result = df1.join(df2, lsuffix="_L", rsuffix="_R")
print(result)
val_L extra val_R info
a 1 5 3 7
b 2 6 4 8
Only the val column (present in both) gets suffixes. extra and info remain unchanged.
Combining on with lsuffix/rsuffix¶
Both parameters work together when joining on a column and facing name collisions.
orders = pd.DataFrame({
"product_id": ["P1", "P2", "P1"],
"quantity": [10, 5, 8]
})
products = pd.DataFrame(
{"quantity": [100, 50]}, # "quantity" here means stock quantity
index=["P1", "P2"]
)
result = orders.join(
products,
on="product_id",
lsuffix="_ordered",
rsuffix="_in_stock"
)
print(result)
product_id quantity_ordered quantity_in_stock
0 P1 10 100
1 P2 5 50
2 P1 8 100
Suffixes Are Required for Overlapping Names
Unlike pd.merge which has default suffixes (_x and _y), DataFrame.join raises an error if overlapping column names exist and no suffixes are provided. Always specify lsuffix and rsuffix when name collisions are possible.
Summary¶
| Parameter | Purpose | Default |
|---|---|---|
on |
Column(s) from the caller to use as join key | None (use index) |
lsuffix |
Suffix for overlapping columns from the left DataFrame | "" (empty — raises error on overlap) |
rsuffix |
Suffix for overlapping columns from the right DataFrame | "" (empty — raises error on overlap) |
Key Takeaways:
onredirects the join key from the caller's index to one of its columns; the other DataFrame is still matched by its indexlsuffixandrsuffixresolve column name collisions by appending suffixes to the conflicting names- Non-overlapping columns are never modified by suffix parameters
- Unlike
pd.merge,DataFrame.joindoes not provide default suffixes — they must be specified explicitly when overlaps exist - Use
onwith a list of column names when the other DataFrame has a MultiIndex