SP500 Analysis¶

A comprehensive example demonstrating pandas operations for analyzing S&P 500 stock data.

Mental Model

This example ties together the full pandas workflow: download data, inspect it, filter and group, compute statistics, and visualize results. Think of it as a capstone that shows how individual pandas verbs chain together into a real analysis pipeline, with an OOP wrapper for reusability.

SP500 Class Design¶

Build a class to download and analyze S&P 500 data.

1. Class Structure¶

```python import pandas as pd import yfinance as yf

class SP500: """ Class to download SP500 companies' fundamental and stock price data. """

def __init__(self):
    self.tickers = []
    self.data = pd.DataFrame()
    self.price_data = pd.DataFrame()
    self.fundamental_data = pd.DataFrame()

```

2. Fetch Tickers¶

python def fetch_sp500_tickers(self): table = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies') self.tickers = table[0]['Symbol'].tolist() # Adjust ticker symbols if needed self.tickers = [ticker.replace('.', '-') for ticker in self.tickers]

3. Fetch Data¶

```python def fetch_data(self): infos = [] closes = []

    for ticker in self.tickers:
        try:
            stock = yf.Ticker(ticker)
            info = stock.info
            df = stock.history(period='1y')
            closes.append(df[['Close']].rename(columns={'Close': ticker}))
            infos.append({
                'Ticker': ticker,
                'PER (Trailing)': info.get('trailingPE'),
                'PBR': info.get('priceToBook'),
                'Market Cap': info.get('marketCap')
            })
        except Exception as e:
            print(f"Failed to fetch {ticker}: {e}")
            continue

    self.price_data = pd.concat(closes, axis=1)
    self.fundamental_data = pd.DataFrame(infos)

```

Data Merging¶

Combine price and fundamental data.

1. Merge Method¶

python def merge_data(self): self.data = self.fundamental_data.set_index('Ticker').join( self.price_data.transpose(), how='left' )

2. Get Data¶

python def get_data(self): return self.data

3. Usage Example¶

```python

Example Usage¶

sp500 = SP500() sp500.fetch_sp500_tickers() sp500.fetch_data() sp500.merge_data() data = sp500.get_data() print(data.head()) ```

Analysis Operations¶

Common analysis patterns with the data.

1. Filter by Sector¶

```python

Assuming sector data is available¶

finance_df = data[data['Sector'] == 'Finance'] ```

2. Group Statistics¶

python sector_stats = data.groupby('Sector').agg({ 'Market Cap': 'sum', 'PER (Trailing)': 'mean', 'PBR': ['mean', 'std'] })

3. Top Performers¶

```python

Top 10 by market cap¶

top_10 = data.nlargest(10, 'Market Cap') ```

Visualization Integration¶

Combine pandas with matplotlib.

1. Sector Distribution¶

```python import matplotlib.pyplot as plt

sector_caps = data.groupby('Sector')['Market Cap'].sum() sector_caps.plot(kind='bar', figsize=(12, 6)) plt.title('Market Cap by Sector') plt.ylabel('Market Cap ($)') plt.show() ```

2. Price Correlation¶

```python

Correlation matrix of prices¶

price_corr = sp500.price_data.corr() ```

3. Returns Analysis¶

python returns = sp500.price_data.pct_change() returns.mean().nlargest(10).plot(kind='bar') plt.title('Top 10 Average Daily Returns') plt.show()

Best Practices¶

Guidelines for large-scale data analysis.

1. Error Handling¶

```python

Always handle API errors gracefully¶

try: data = yf.download(ticker) except Exception as e: print(f"Error: {e}") ```

2. Incremental Loading¶

```python

For large datasets, process in batches¶

batch_size = 50 for i in range(0, len(tickers), batch_size): batch = tickers[i:i+batch_size] # Process batch ```

3. Caching Results¶

```python

Save intermediate results¶

data.to_pickle('sp500_data.pkl')

Load later¶

data = pd.read_pickle('sp500_data.pkl') ```

Exercises¶

Exercise 1. Write code that creates a DataFrame of stock prices (date, close) and computes the 20-day and 50-day rolling means.

Solution to Exercise 1

```python import pandas as pd

df = pd.DataFrame({ 'name': ['Alice', 'Bob', 'Charlie', 'David'], 'salary': [70000, 80000, 60000, 90000], 'department': ['IT', 'IT', 'HR', 'HR'] }) result = df.groupby('department')['salary'].max() print(result) ```

Exercise 2. Explain how to compute daily returns from a price series using pct_change(). Write code demonstrating this.

Solution to Exercise 2

See the main content for the relevant patterns and API calls. The solution involves understanding how to combine Pandas operations to solve data manipulation problems.

Exercise 3. Write code that resamples daily stock data to monthly frequency, taking the last close price of each month.

Solution to Exercise 3

```python import pandas as pd import numpy as np

np.random.seed(42) df = pd.DataFrame({ 'value': np.random.randint(0, 100, 20), 'group': np.random.choice(['A', 'B'], 20) }) result = df.groupby('group')['value'].transform('sum') print(result) ```

Exercise 4. Create a function that takes a price DataFrame and returns the maximum drawdown (largest peak-to-trough decline).

Solution to Exercise 4

```python import pandas as pd import numpy as np

np.random.seed(42) s = pd.Series(np.random.randn(100)) s_clean = s.clip(lower=0) print(s_clean.describe()) ```