Keyword - suffixes¶
The suffixes parameter specifies strings to append to overlapping column names when both DataFrames have columns with the same name.
Default Suffixes¶
Default suffixes are _x and _y.
1. Overlapping Columns¶
import pandas as pd
df1 = pd.DataFrame({
'city': ['NY', 'SF', 'LA'],
'temperature': [21, 14, 35]
})
df2 = pd.DataFrame({
'city': ['SF', 'NY', 'ICN'],
'temperature': [22, 15, 30],
'humidity': [65, 68, 75]
})
df = pd.merge(df1, df2, on='city')
print(df)
city temperature_x temperature_y humidity
0 NY 21 15 68
1 SF 14 22 65
2. Default Values¶
# Default: suffixes=('_x', '_y')
3. Join Column Not Suffixed¶
The on column appears only once without suffix.
Custom Suffixes¶
Specify meaningful suffix names.
1. Descriptive Names¶
df = pd.merge(
df1, df2,
on='city',
suffixes=['_left', '_right']
)
print(df)
city temperature_left temperature_right humidity
0 NY 21 15 68
1 SF 14 22 65
2. Source Identification¶
df = pd.merge(
df1, df2,
on='city',
suffixes=['_2023', '_2024']
)
3. Table Names¶
df = pd.merge(
orders, products,
on='product_id',
suffixes=['_order', '_product']
)
Empty String Suffix¶
Use empty string to keep one column unchanged.
1. Keep Left Original¶
df = pd.merge(
df1, df2,
on='city',
suffixes=['', '_new']
)
# temperature (from df1)
# temperature_new (from df2)
2. Keep Right Original¶
df = pd.merge(
df1, df2,
on='city',
suffixes=['_old', '']
)
3. Avoid Both Empty¶
# Both empty strings cause column name collision
# suffixes=['', ''] # Will raise error
LeetCode Example: Employee Manager Comparison¶
Self-join with suffixes for clarity.
1. Sample Data¶
employee = pd.DataFrame({
'id': [1, 2, 3, 4],
'name': ['John', 'Doe', 'Jane', 'Smith'],
'salary': [50000, 40000, 60000, 30000],
'managerId': [None, 1, 1, 2]
})
2. Self Merge with Suffixes¶
merged = pd.merge(
left=employee,
right=employee,
left_on='managerId',
right_on='id',
how='inner',
suffixes=('_employee', '_manager')
)
3. Clear Column Names¶
# Columns: id_employee, name_employee, salary_employee, ...
# id_manager, name_manager, salary_manager, ...
Handling Multiple Overlaps¶
Suffixes apply to all overlapping columns.
1. Multiple Columns¶
df1 = pd.DataFrame({
'key': ['A', 'B'],
'value1': [1, 2],
'value2': [3, 4]
})
df2 = pd.DataFrame({
'key': ['A', 'B'],
'value1': [10, 20],
'value2': [30, 40]
})
df = pd.merge(df1, df2, on='key', suffixes=['_v1', '_v2'])
# value1_v1, value1_v2, value2_v1, value2_v2
2. Consistent Naming¶
All overlapping columns get the same suffixes.
3. Post-merge Rename¶
# Rename specific columns after merge
df = df.rename(columns={
'value1_v1': 'value1_original',
'value1_v2': 'value1_updated'
})