expanding Method¶
The expanding() method computes statistics over all data from the start up to each point, with a window that grows over time.
Mental Model
An expanding window starts at the first row and grows by one row at each step -- it computes a cumulative statistic. At row 10, it uses rows 1-10; at row 100, it uses rows 1-100. This is useful for running averages, cumulative returns, and any "so far" metric that incorporates all prior history.
Basic Expanding¶
Create expanding window calculations.
1. Cumulative Mean¶
```python import pandas as pd import yfinance as yf
aapl = yf.download('AAPL', start='2023-01-01', end='2024-01-01') aapl['Cumulative_Mean'] = aapl['Close'].expanding().mean() print(aapl[['Close', 'Cumulative_Mean']].head(10)) ```
2. Cumulative Sum¶
python
aapl['Cumulative_Sum'] = aapl['Volume'].expanding().sum()
3. Cumulative Standard Deviation¶
python
aapl['Cumulative_STD'] = aapl['Close'].expanding().std()
How Expanding Works¶
At time t, includes observations 1 to t.
1. Window Growth¶
```python
Day 1: window = [1]¶
Day 2: window = [1, 2]¶
Day 3: window = [1, 2, 3]¶
Day N: window = [1, 2, ..., N]¶
```
2. Formula for Mean¶
At time \(t\):
3. vs Rolling¶
```python
Rolling: fixed window size¶
Expanding: window grows from 1 to N¶
```
min_periods Parameter¶
Minimum observations before calculating.
1. Default (min_periods=1)¶
python
s.expanding().mean() # Starts from first value
2. With Minimum¶
python
s.expanding(min_periods=5).mean() # First 4 values are NaN
3. Use Case¶
```python
Wait for enough data before computing volatility¶
s.expanding(min_periods=20).std() ```
Common Applications¶
Typical expanding calculations.
1. Cumulative Returns¶
python
returns = aapl['Close'].pct_change()
cumulative = (1 + returns).expanding().prod() - 1
2. Running Statistics¶
python
aapl['Running_Max'] = aapl['Close'].expanding().max()
aapl['Running_Min'] = aapl['Close'].expanding().min()
3. Progressive Estimates¶
```python
Track how estimates converge¶
aapl['Mean_Estimate'] = aapl['Close'].expanding().mean() ```
Financial Example¶
Track cumulative performance metrics.
1. Cumulative Sharpe¶
```python import numpy as np
returns = aapl['Close'].pct_change() aapl['Cum_Mean'] = returns.expanding().mean() aapl['Cum_Std'] = returns.expanding().std() aapl['Cum_Sharpe'] = aapl['Cum_Mean'] / aapl['Cum_Std'] * np.sqrt(252) ```
2. All-Time High¶
python
aapl['ATH'] = aapl['Close'].expanding().max()
aapl['Drawdown'] = aapl['Close'] / aapl['ATH'] - 1
3. Since Inception Returns¶
python
first_price = aapl['Close'].iloc[0]
aapl['Since_Inception'] = aapl['Close'] / first_price - 1
Expanding vs cumsum/cumprod¶
Comparison with built-in cumulative functions.
1. cumsum¶
```python
Equivalent for sum¶
s.expanding().sum() s.cumsum() # Same result, faster ```
2. cumprod¶
```python
Equivalent for product¶
s.expanding().prod() s.cumprod() # Same result, faster ```
3. When to Use Each¶
```python
Use cumsum/cumprod for simple cumulative operations¶
Use expanding() for mean, std, custom functions¶
```
Exercises¶
Exercise 1. Write code that computes the expanding (cumulative) mean using s.expanding().mean().
Solution to Exercise 1
```python import pandas as pd import numpy as np
Solution for the specific exercise¶
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(10), 'B': np.random.randn(10)}) print(df.head()) ```
Exercise 2. Explain the difference between expanding() and rolling(). When would you use each?
Solution to Exercise 2
See the main content for the detailed explanation. The key concept involves understanding the Pandas API and its behavior for this specific operation.
Exercise 3. Write code that computes the expanding maximum and minimum of a time series. What do these represent?
Solution to Exercise 3
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(20), 'B': np.random.randn(20)}) result = df.describe() print(result) ```
Exercise 4. Create a DataFrame and use expanding(min_periods=10).std() to compute the expanding standard deviation, requiring at least 10 observations.
Solution to Exercise 4
```python import pandas as pd import numpy as np
np.random.seed(42) df = pd.DataFrame({'A': np.random.randn(50), 'group': np.random.choice(['X', 'Y'], 50)}) result = df.groupby('group').mean() print(result) ```