Skip to content

merge_asof Method

The merge_asof() function performs an approximate merge, matching on the nearest key rather than exact equality. It is designed for ordered data like time series.

Mental Model

merge_asof is a "closest match" join. For each row in the left table, it finds the nearest preceding (or nearest overall) key in the right table. This is essential for time series: if you have trades at irregular timestamps and quotes at different timestamps, merge_asof pairs each trade with the most recent quote.

Basic Concept

Match to the nearest preceding value.

1. Time-based Matching

```python import pandas as pd

Sales data

units_sold = pd.DataFrame({ 'product_id': [1, 2, 3], 'purchase_date': pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03']) })

Price data

prices = pd.DataFrame({ 'product_id': [1, 2], 'start_date': pd.to_datetime(['2022-01-01', '2022-01-02']), 'price': [100, 200] }) ```

2. merge_asof vs merge

```python

Regular merge requires exact match

pd.merge(units_sold, prices, left_on='purchase_date', right_on='start_date')

merge_asof finds nearest preceding match

pd.merge_asof( units_sold.sort_values('purchase_date'), prices.sort_values('start_date'), left_on='purchase_date', right_on='start_date' ) ```

3. Requires Sorted Data

Both DataFrames must be sorted by the merge key.

Practical Example

Match sales to applicable prices.

1. Sample Data

```python units_sold = pd.DataFrame({ 'product_id': [1, 1, 2, 2], 'purchase_date': pd.to_datetime([ '2022-01-15', '2022-05-10', '2022-03-01', '2022-07-01' ]), 'units': [10, 5, 8, 12] })

prices = pd.DataFrame({ 'product_id': [1, 1, 2, 2], 'start_date': pd.to_datetime([ '2022-01-01', '2022-04-01', '2022-01-01', '2022-06-01' ]), 'price': [100, 120, 200, 180] }) ```

2. Sort Before Merge

python units_sold = units_sold.sort_values('purchase_date') prices = prices.sort_values('start_date')

3. merge_asof with by

python result = pd.merge_asof( units_sold, prices, left_on='purchase_date', right_on='start_date', by='product_id' ) print(result)

Key Parameters

Configure merge_asof behavior.

1. by Parameter

```python

Match exactly on 'by' columns, approximately on merge key

pd.merge_asof( left, right, on='timestamp', by='product_id' # Exact match required ) ```

2. direction Parameter

```python

'backward' (default): match to previous value

'forward': match to next value

'nearest': match to closest value

pd.merge_asof(left, right, on='time', direction='backward') pd.merge_asof(left, right, on='time', direction='forward') pd.merge_asof(left, right, on='time', direction='nearest') ```

3. tolerance Parameter

```python

Maximum distance for match

pd.merge_asof( left, right, on='timestamp', tolerance=pd.Timedelta('1 day') # Max 1 day gap ) ```

Financial Application

Match trades to quotes.

1. Trade Data

python trades = pd.DataFrame({ 'timestamp': pd.to_datetime([ '2024-01-01 10:00:05', '2024-01-01 10:00:15', '2024-01-01 10:00:25' ]), 'quantity': [100, 50, 75] })

2. Quote Data

python quotes = pd.DataFrame({ 'timestamp': pd.to_datetime([ '2024-01-01 10:00:00', '2024-01-01 10:00:10', '2024-01-01 10:00:20' ]), 'bid': [99.0, 99.5, 99.8], 'ask': [100.0, 100.5, 100.8] })

3. Match Trades to Quotes

```python result = pd.merge_asof( trades.sort_values('timestamp'), quotes.sort_values('timestamp'), on='timestamp', direction='backward' )

Each trade gets the most recent quote

```

Comparison with Regular Merge

When to use each method.

1. Exact Match (merge)

```python

Use when keys must match exactly

Foreign key relationships

Categorical data

```

2. Approximate Match (merge_asof)

```python

Use for time-based data

Price lookups with effective dates

Event matching within tolerance

```

3. Performance

```python

merge_asof is optimized for sorted data

Faster than alternatives for time series

```


Exercises

Exercise 1. Create a trades DataFrame with timestamps and a quotes DataFrame with slightly earlier timestamps. Use pd.merge_asof() to match each trade with the most recent quote that occurred at or before the trade time.

Solution to Exercise 1

Match trades to the most recent preceding quote.

import pandas as pd

trades = pd.DataFrame({
    'time': pd.to_datetime(['10:00:01', '10:00:03', '10:00:05']),
    'price': [100, 102, 101]
})
quotes = pd.DataFrame({
    'time': pd.to_datetime(['10:00:00', '10:00:02', '10:00:04']),
    'bid': [99, 101, 100]
})
result = pd.merge_asof(trades, quotes, on='time')
print(result)

Exercise 2. Use pd.merge_asof() with the tolerance parameter to match only if the time difference is within a specified window (e.g., 2 seconds). Verify that trades without a quote within the tolerance get NaN.

Solution to Exercise 2

Use tolerance to limit the match window.

import pandas as pd

trades = pd.DataFrame({
    'time': pd.to_datetime(['10:00:01', '10:00:10']),
    'price': [100, 102]
})
quotes = pd.DataFrame({
    'time': pd.to_datetime(['10:00:00', '10:00:02']),
    'bid': [99, 101]
})
result = pd.merge_asof(trades, quotes, on='time', tolerance=pd.Timedelta('2s'))
print(result)
# Second trade has no quote within 2 seconds -> NaN

Exercise 3. Use pd.merge_asof() with a by parameter to perform the approximate match within groups (e.g., match each trade to the latest quote for the same stock ticker).

Solution to Exercise 3

Use the by parameter for grouped asof merge.

import pandas as pd

trades = pd.DataFrame({
    'time': pd.to_datetime(['10:00:01', '10:00:01', '10:00:03']),
    'ticker': ['AAPL', 'MSFT', 'AAPL'],
    'price': [150, 250, 152]
})
quotes = pd.DataFrame({
    'time': pd.to_datetime(['10:00:00', '10:00:00', '10:00:02']),
    'ticker': ['AAPL', 'MSFT', 'AAPL'],
    'bid': [149, 249, 151]
})
result = pd.merge_asof(trades, quotes, on='time', by='ticker')
print(result)