Skip to content

merge Method

The merge() function combines DataFrames based on column values, similar to SQL joins. It is the most flexible method for combining datasets.

Basic Syntax

Merge two DataFrames on a common column.

1. Simple Merge

import pandas as pd

df1 = pd.DataFrame({
    'city': ['NY', 'SF', 'LA'],
    'temperature': [21, 14, 35]
})

df2 = pd.DataFrame({
    'city': ['SF', 'NY', 'ICN'],
    'humidity': [65, 68, 75]
})

df = pd.merge(df1, df2, on='city')
print(df)
  city  temperature  humidity
0   NY           21        68
1   SF           14        65

2. Default Inner Join

By default, merge performs an inner join, keeping only matching rows.

3. Method Syntax

# Function syntax
pd.merge(df1, df2, on='key')

# Method syntax (equivalent)
df1.merge(df2, on='key')

LeetCode Example: Person and Address

Combine person information with address.

1. Sample Data

person = pd.DataFrame({
    'personId': [1, 2, 3],
    'firstName': ['John', 'Jane', 'Jake'],
    'lastName': ['Doe', 'Smith', 'Brown']
})

address = pd.DataFrame({
    'personId': [1, 3],
    'city': ['New York', 'Los Angeles'],
    'state': ['NY', 'CA']
})

2. Left Merge

merged_df = person.merge(address, on='personId', how='left')
print(merged_df)

3. Result

   personId firstName lastName         city state
0         1      John      Doe     New York    NY
1         2      Jane    Smith          NaN   NaN
2         3      Jake    Brown  Los Angeles    CA

LeetCode Example: Project Employees

Join projects with employee experience.

1. Sample Data

project = pd.DataFrame({
    'project_id': [101, 102, 103],
    'project_name': ['Project A', 'Project B', 'Project C']
})

employee = pd.DataFrame({
    'employee_id': [1, 2, 3, 4, 5],
    'project_id': [101, 101, 102, 102, 103],
    'experience_years': [5, 7, 3, 4, 6]
})

2. Merge with Default

merged_df = pd.merge(project, employee, how='left')
print(merged_df)

3. Auto-detected Key

When on=None, merge uses common column names automatically.

Common Column Detection

Merge automatically finds common columns when on is not specified.

1. Auto Detection

# Both DataFrames have 'project_id' column
pd.merge(project, employee)  # on='project_id' inferred

2. Multiple Common Columns

# If both have ['key1', 'key2'], merges on both
pd.merge(df1, df2)

3. No Common Columns

# Raises ValueError if no common columns exist
# Must specify left_on and right_on

Merge Result Structure

Understanding the output DataFrame.

1. Column Order

# Left DataFrame columns first, then right
# Common column appears once (unless using suffixes)

2. Row Order

# Order depends on join type and matching

3. Index Reset

# Merge resets index to RangeIndex
# Original indices are discarded

Runnable Example: merge_join_tutorial.py

"""
Pandas Tutorial: Merging, Joining, and Concatenating DataFrames.

Covers different ways to combine DataFrames.
"""

import pandas as pd
import numpy as np

# =============================================================================
# Main
# =============================================================================

if __name__ == "__main__":

    print("="*70)
    print("MERGING, JOINING, AND CONCATENATING")
    print("="*70)

    # Create sample DataFrames
    df1 = pd.DataFrame({
        'ID': [1, 2, 3, 4],
        'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 28]
    })

    df2 = pd.DataFrame({
        'ID': [1, 2, 3, 5],
        'Department': ['HR', 'IT', 'Finance', 'Marketing'],
        'Salary': [50000, 60000, 75000, 55000]
    })

    print("\nDataFrame 1 (Employees):")
    print(df1)
    print("\nDataFrame 2 (Departments):")
    print(df2)

    # Inner join (default)
    print("\n1. Inner Join (intersection):")
    inner_merged = pd.merge(df1, df2, on='ID', how='inner')
    print(inner_merged)

    # Left join
    print("\n2. Left Join (keep all from left):")
    left_merged = pd.merge(df1, df2, on='ID', how='left')
    print(left_merged)

    # Right join
    print("\n3. Right Join (keep all from right):")
    right_merged = pd.merge(df1, df2, on='ID', how='right')
    print(right_merged)

    # Outer join
    print("\n4. Outer Join (keep all from both):")
    outer_merged = pd.merge(df1, df2, on='ID', how='outer')
    print(outer_merged)

    # Merge on different column names
    df3 = pd.DataFrame({
        'EmpID': [1, 2, 3],
        'Project': ['A', 'B', 'C']
    })

    print("\n5. Merge on different column names:")
    merged_diff = pd.merge(df1, df3, left_on='ID', right_on='EmpID')
    print(merged_diff)

    # Concatenate DataFrames vertically
    df_top = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
    df_bottom = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

    print("\n6. Concatenate vertically (stack rows):")
    print("Top:")
    print(df_top)
    print("Bottom:")
    print(df_bottom)
    print("Result:")
    vertical_concat = pd.concat([df_top, df_bottom], ignore_index=True)
    print(vertical_concat)

    # Concatenate horizontally
    df_left = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
    df_right = pd.DataFrame({'C': [7, 8, 9], 'D': [10, 11, 12]})

    print("\n7. Concatenate horizontally (add columns):")
    horizontal_concat = pd.concat([df_left, df_right], axis=1)
    print(horizontal_concat)

    # Join (using index)
    df_indexed1 = df1.set_index('ID')
    df_indexed2 = df2.set_index('ID')

    print("\n8. Join using index:")
    joined = df_indexed1.join(df_indexed2, how='inner')
    print(joined)

    print("\nKEY TAKEAWAYS:")
    print("- merge(): SQL-style joins on columns")
    print("- concat(): Stack DataFrames vertically or horizontally")
    print("- join(): Merge on index")
    print("- Join types: inner, left, right, outer")
    print("- Use on= for same column names, left_on=/right_on= for different names")