Financial Data Workflow¶

Mental Model

The yfinance workflow is a three-stage pipeline: download raw OHLCV data, transform it with pandas (compute returns, resample, merge tickers), and visualize the results. Each stage maps directly to pandas operations you already know -- the financial domain just provides the context.

Data Download¶

1. Single Ticker¶

```python import yfinance as yf

df = yf.download("AAPL", start="2023-01-01", end="2023-12-31") print(type(df)) # DataFrame print(df.columns) # ['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'] ```

2. Index Types¶

python print(type(df.index)) # DatetimeIndex print(df.index[0]) # Timestamp('2023-01-03')

3. Access Patterns¶

```python

By label¶

df.loc['2023-01-03', 'Close']

By position¶

df.iloc[0, 3]

Column¶

df['Close'] ```

Multi-Asset Analysis¶

1. Multiple Tickers¶

python tickers = ["AAPL", "MSFT", "GOOGL"] df = yf.download(tickers, start="2023-01-01")

2. MultiIndex Columns¶

```python print(df.columns)

MultiIndex([('Open', 'AAPL'), ('Open', 'MSFT'), ...])¶

Access¶

df['Close']['AAPL'] # Apple closing prices df[('Close', 'AAPL')] # Alternative ```

3. Comparison¶

```python

Normalize to first day¶

normalized = df['Close'] / df['Close'].iloc[0] normalized.plot(figsize=(12, 6)) ```

Visualization¶

1. Price Series¶

```python import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(12, 6)) df['Close'].plot(ax=ax) ax.set_ylabel('Price ($)') ax.set_title('AAPL Closing Price') plt.show() ```

2. Multiple Assets¶

python fig, ax = plt.subplots(figsize=(12, 6)) for ticker in ['AAPL', 'MSFT', 'GOOGL']: (df['Close'][ticker] / df['Close'][ticker].iloc[0]).plot(ax=ax, label=ticker) ax.legend() ax.set_ylabel('Normalized Price') plt.show()

3. Volume¶

python fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8), sharex=True) df['Close'].plot(ax=ax1) df['Volume'].plot(ax=ax2, color='gray', alpha=0.5) ax1.set_ylabel('Price') ax2.set_ylabel('Volume')

Analysis¶

1. Returns¶

```python

Daily returns¶

returns = df['Close'].pct_change() returns.plot(kind='hist', bins=50, alpha=0.6) ```

2. Moving Averages¶

```python df['MA20'] = df['Close'].rolling(20).mean() df['MA50'] = df['Close'].rolling(50).mean()

fig, ax = plt.subplots(figsize=(12, 6)) df[['Close', 'MA20', 'MA50']].plot(ax=ax) ax.set_title('Price with Moving Averages') ```

3. Volatility¶

```python

20-day rolling volatility¶

volatility = returns.rolling(20).std() * np.sqrt(252) volatility.plot(figsize=(12, 4)) ```

Exercises¶

Exercise 1. Write code that creates a synthetic stock price DataFrame with columns ['Open', 'High', 'Low', 'Close', 'Volume'] and a DatetimeIndex.

Solution to Exercise 1

```python import pandas as pd

df = pd.DataFrame({ 'name': ['Alice', 'Bob', 'Charlie', 'David'], 'salary': [70000, 80000, 60000, 90000], 'department': ['IT', 'IT', 'HR', 'HR'] }) result = df.groupby('department')['salary'].max() print(result) ```

Exercise 2. Explain the typical workflow for downloading, cleaning, and analyzing financial data with Pandas.

Solution to Exercise 2

See the main content for the relevant patterns and API calls. The solution involves understanding how to combine Pandas operations to solve data manipulation problems.

Exercise 3. Write code that computes the cumulative return of a price series. Start from the formula: cumulative return = (price / first_price) - 1.

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({ 'value': np.random.randint(0, 100, 20), 'group': np.random.choice(['A', 'B'], 20) }) result = df.groupby('group')['value'].transform('sum') print(result) ```

Exercise 4. Create a function that takes a DataFrame of daily prices and returns a summary with: total return, annualized return, and volatility.

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) s = pd.Series(np.random.randn(100)) s_clean = s.clip(lower=0) print(s_clean.describe()) ```