Skip to content

merge_asof Method

The merge_asof() function performs an approximate merge, matching on the nearest key rather than exact equality. It is designed for ordered data like time series.

Basic Concept

Match to the nearest preceding value.

1. Time-based Matching

import pandas as pd

# Sales data
units_sold = pd.DataFrame({
    'product_id': [1, 2, 3],
    'purchase_date': pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03'])
})

# Price data
prices = pd.DataFrame({
    'product_id': [1, 2],
    'start_date': pd.to_datetime(['2022-01-01', '2022-01-02']),
    'price': [100, 200]
})

2. merge_asof vs merge

# Regular merge requires exact match
pd.merge(units_sold, prices, 
         left_on='purchase_date', right_on='start_date')

# merge_asof finds nearest preceding match
pd.merge_asof(
    units_sold.sort_values('purchase_date'),
    prices.sort_values('start_date'),
    left_on='purchase_date',
    right_on='start_date'
)

3. Requires Sorted Data

Both DataFrames must be sorted by the merge key.

Practical Example

Match sales to applicable prices.

1. Sample Data

units_sold = pd.DataFrame({
    'product_id': [1, 1, 2, 2],
    'purchase_date': pd.to_datetime([
        '2022-01-15', '2022-05-10', '2022-03-01', '2022-07-01'
    ]),
    'units': [10, 5, 8, 12]
})

prices = pd.DataFrame({
    'product_id': [1, 1, 2, 2],
    'start_date': pd.to_datetime([
        '2022-01-01', '2022-04-01', '2022-01-01', '2022-06-01'
    ]),
    'price': [100, 120, 200, 180]
})

2. Sort Before Merge

units_sold = units_sold.sort_values('purchase_date')
prices = prices.sort_values('start_date')

3. merge_asof with by

result = pd.merge_asof(
    units_sold,
    prices,
    left_on='purchase_date',
    right_on='start_date',
    by='product_id'
)
print(result)

Key Parameters

Configure merge_asof behavior.

1. by Parameter

# Match exactly on 'by' columns, approximately on merge key
pd.merge_asof(
    left, right,
    on='timestamp',
    by='product_id'  # Exact match required
)

2. direction Parameter

# 'backward' (default): match to previous value
# 'forward': match to next value
# 'nearest': match to closest value

pd.merge_asof(left, right, on='time', direction='backward')
pd.merge_asof(left, right, on='time', direction='forward')
pd.merge_asof(left, right, on='time', direction='nearest')

3. tolerance Parameter

# Maximum distance for match
pd.merge_asof(
    left, right,
    on='timestamp',
    tolerance=pd.Timedelta('1 day')  # Max 1 day gap
)

Financial Application

Match trades to quotes.

1. Trade Data

trades = pd.DataFrame({
    'timestamp': pd.to_datetime([
        '2024-01-01 10:00:05',
        '2024-01-01 10:00:15',
        '2024-01-01 10:00:25'
    ]),
    'quantity': [100, 50, 75]
})

2. Quote Data

quotes = pd.DataFrame({
    'timestamp': pd.to_datetime([
        '2024-01-01 10:00:00',
        '2024-01-01 10:00:10',
        '2024-01-01 10:00:20'
    ]),
    'bid': [99.0, 99.5, 99.8],
    'ask': [100.0, 100.5, 100.8]
})

3. Match Trades to Quotes

result = pd.merge_asof(
    trades.sort_values('timestamp'),
    quotes.sort_values('timestamp'),
    on='timestamp',
    direction='backward'
)
# Each trade gets the most recent quote

Comparison with Regular Merge

When to use each method.

1. Exact Match (merge)

# Use when keys must match exactly
# Foreign key relationships
# Categorical data

2. Approximate Match (merge_asof)

# Use for time-based data
# Price lookups with effective dates
# Event matching within tolerance

3. Performance

# merge_asof is optimized for sorted data
# Faster than alternatives for time series