merge_asof Method¶
The merge_asof() function performs an approximate merge, matching on the nearest key rather than exact equality. It is designed for ordered data like time series.
Basic Concept¶
Match to the nearest preceding value.
1. Time-based Matching¶
import pandas as pd
# Sales data
units_sold = pd.DataFrame({
'product_id': [1, 2, 3],
'purchase_date': pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03'])
})
# Price data
prices = pd.DataFrame({
'product_id': [1, 2],
'start_date': pd.to_datetime(['2022-01-01', '2022-01-02']),
'price': [100, 200]
})
2. merge_asof vs merge¶
# Regular merge requires exact match
pd.merge(units_sold, prices,
left_on='purchase_date', right_on='start_date')
# merge_asof finds nearest preceding match
pd.merge_asof(
units_sold.sort_values('purchase_date'),
prices.sort_values('start_date'),
left_on='purchase_date',
right_on='start_date'
)
3. Requires Sorted Data¶
Both DataFrames must be sorted by the merge key.
Practical Example¶
Match sales to applicable prices.
1. Sample Data¶
units_sold = pd.DataFrame({
'product_id': [1, 1, 2, 2],
'purchase_date': pd.to_datetime([
'2022-01-15', '2022-05-10', '2022-03-01', '2022-07-01'
]),
'units': [10, 5, 8, 12]
})
prices = pd.DataFrame({
'product_id': [1, 1, 2, 2],
'start_date': pd.to_datetime([
'2022-01-01', '2022-04-01', '2022-01-01', '2022-06-01'
]),
'price': [100, 120, 200, 180]
})
2. Sort Before Merge¶
units_sold = units_sold.sort_values('purchase_date')
prices = prices.sort_values('start_date')
3. merge_asof with by¶
result = pd.merge_asof(
units_sold,
prices,
left_on='purchase_date',
right_on='start_date',
by='product_id'
)
print(result)
Key Parameters¶
Configure merge_asof behavior.
1. by Parameter¶
# Match exactly on 'by' columns, approximately on merge key
pd.merge_asof(
left, right,
on='timestamp',
by='product_id' # Exact match required
)
2. direction Parameter¶
# 'backward' (default): match to previous value
# 'forward': match to next value
# 'nearest': match to closest value
pd.merge_asof(left, right, on='time', direction='backward')
pd.merge_asof(left, right, on='time', direction='forward')
pd.merge_asof(left, right, on='time', direction='nearest')
3. tolerance Parameter¶
# Maximum distance for match
pd.merge_asof(
left, right,
on='timestamp',
tolerance=pd.Timedelta('1 day') # Max 1 day gap
)
Financial Application¶
Match trades to quotes.
1. Trade Data¶
trades = pd.DataFrame({
'timestamp': pd.to_datetime([
'2024-01-01 10:00:05',
'2024-01-01 10:00:15',
'2024-01-01 10:00:25'
]),
'quantity': [100, 50, 75]
})
2. Quote Data¶
quotes = pd.DataFrame({
'timestamp': pd.to_datetime([
'2024-01-01 10:00:00',
'2024-01-01 10:00:10',
'2024-01-01 10:00:20'
]),
'bid': [99.0, 99.5, 99.8],
'ask': [100.0, 100.5, 100.8]
})
3. Match Trades to Quotes¶
result = pd.merge_asof(
trades.sort_values('timestamp'),
quotes.sort_values('timestamp'),
on='timestamp',
direction='backward'
)
# Each trade gets the most recent quote
Comparison with Regular Merge¶
When to use each method.
1. Exact Match (merge)¶
# Use when keys must match exactly
# Foreign key relationships
# Categorical data
2. Approximate Match (merge_asof)¶
# Use for time-based data
# Price lookups with effective dates
# Event matching within tolerance
3. Performance¶
# merge_asof is optimized for sorted data
# Faster than alternatives for time series