shift and diff Methods¶

The shift() and diff() methods are essential for time series analysis, computing lagged values and differences.

Mental Model

shift(n) slides the data up or down by n positions without changing the index -- it creates lagged or lead columns. diff(n) computes value - value_shifted_by_n, giving the absolute change. Together they are the building blocks for returns, growth rates, and any "compare this period to the previous" analysis.

shift Method¶

Move data by specified number of periods.

1. Basic shift¶

```python import pandas as pd import yfinance as yf

df = yf.Ticker('WMT').history(start='2020-01-01', end='2020-01-10') df = df[['Close']]

df['shift_1'] = df['Close'].shift(1) df['shift_2'] = df['Close'].shift(2) print(df) ```

Close shift_1 shift_2 Date 2020-01-02 116.459999 NaN NaN 2020-01-03 116.279999 116.459999 NaN 2020-01-06 116.230003 116.279999 116.459999 2020-01-07 116.849998 116.230003 116.279999 2020-01-08 116.220001 116.849998 116.230003

2. Positive Shift¶

```python

shift(1): current row gets previous value¶

df['previous_close'] = df['Close'].shift(1) ```

3. Negative Shift¶

```python

shift(-1): current row gets next value¶

df['next_close'] = df['Close'].shift(-1) ```

shift for Comparisons¶

Compare current values to previous.

1. Day-over-Day Change¶

python df['change'] = df['Close'] - df['Close'].shift(1)

2. Conditional Logic¶

```python

Did price increase?¶

df['increased'] = df['Close'] > df['Close'].shift(1) ```

3. LeetCode: Rising Temperature¶

```python

Temperature rose AND consecutive day¶

weather[ (weather['temperature'] > weather['temperature'].shift(1)) & (weather['id'] == weather['id'].shift(1) + 1) ] ```

diff Method¶

Calculate difference between consecutive elements.

1. Basic diff¶

```python df = yf.Ticker('WMT').history(start='2020-01-01', end='2020-01-10') df = df[['Close']]

df['diff_1'] = df['Close'].diff(1) df['diff_2'] = df['Close'].diff(2) print(df) ```

Close diff_1 diff_2 Date 2020-01-02 116.459999 NaN NaN 2020-01-03 116.279999 -0.180000 NaN 2020-01-06 116.230003 -0.049996 -0.229996 2020-01-07 116.849998 0.619995 0.569999 2020-01-08 116.220001 -0.629997 0.009998

2. diff Equivalent¶

```python

diff(1) is equivalent to:¶

df['Close'] - df['Close'].shift(1) ```

3. Higher Order Differences¶

```python

Second difference (change in change)¶

df['second_diff'] = df['Close'].diff().diff() ```

LeetCode Example: Rising Temperature¶

Find days with temperature increase.

1. Sample Data¶

python weather = pd.DataFrame({ 'id': [1, 2, 3, 4], 'recordDate': pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04']), 'temperature': [10, 25, 20, 30] }) weather = weather.sort_values('recordDate')

2. Using diff¶

```python

Temperature increased AND consecutive dates¶

result = weather[ (weather['temperature'].diff() > 0) & (weather['recordDate'].diff().dt.days == 1) ] ```

3. Using shift¶

python result = weather[ (weather['temperature'] > weather['temperature'].shift(1)) & (weather['recordDate'] == weather['recordDate'].shift(1) + pd.Timedelta(days=1)) ]

LeetCode Example: Consecutive Numbers¶

Find numbers appearing 3+ times consecutively.

1. Sample Data¶

python logs = pd.DataFrame({ 'id': [1, 2, 3, 4, 5, 6], 'num': [1, 1, 1, 2, 1, 2] }) logs = logs.sort_values('id')

2. Check Consecutive¶

python consecutive = logs[ (logs['num'] == logs['num'].shift(1)) & (logs['num'] == logs['num'].shift(2)) & (logs['id'] == logs['id'].shift(1) + 1) & (logs['id'] == logs['id'].shift(2) + 2) ]

3. Get Unique Numbers¶

python result = consecutive.drop_duplicates('num')[['num']]

Date Differences¶

shift and diff with dates.

1. Date shift¶

python df['prev_date'] = df['date'].shift(1)

2. Date diff¶

python df['days_between'] = df['date'].diff().dt.days

3. Check Consecutive Dates¶

python df['consecutive'] = df['date'].diff().dt.days == 1

LeetCode: Human Traffic¶

Find 3+ consecutive high traffic days.

1. Sample Data¶

python stadium = pd.DataFrame({ 'id': [1, 2, 3, 5, 6, 7, 8], 'people': [100, 200, 150, 300, 250, 400, 350] })

2. Check Consecutive IDs¶

```python

Current and previous two rows are consecutive¶

consecutive = ( (stadium['id'].diff() == 1) & (stadium['id'].diff().shift(1) == 1) ) ```

3. Alternative¶

```python

Marks the third row of each consecutive sequence¶

consecutive = ( (stadium['id'] == stadium['id'].shift(1) + 1) & (stadium['id'] == stadium['id'].shift(2) + 2) ) ```

Financial Applications¶

Daily returns and changes.

1. Price Change¶

python df['price_change'] = df['Close'].diff()

2. Daily Return¶

```python df['daily_return'] = df['Close'].diff() / df['Close'].shift(1)

or¶

df['daily_return'] = df['Close'].pct_change() ```

3. Cumulative Return¶

python df['cum_return'] = (1 + df['daily_return']).cumprod() - 1

Exercises¶

Exercise 1. Write code that uses s.shift(1) and s.diff(1) on a price Series. Explain the relationship between them.

Solution to Exercise 1

```python import pandas as pd import numpy as np

Solution for the specific exercise¶

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```

Exercise 2. Explain the shift() method. What does a positive vs negative shift value do?

Solution to Exercise 2

See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.

Exercise 3. Write code that computes log returns using np.log(s / s.shift(1)). Compare with pct_change().

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```

Exercise 4. Create a DataFrame and use shift() to create a lagged feature column for time series forecasting.

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```