Basic DataFrame Methods¶

Essential methods for inspecting and exploring DataFrame contents.

head and tail¶

View first or last rows.

1. head Method¶

import pandas as pd
import yfinance as yf

df = yf.Ticker('WMT').history(start='2020-01-01', end='2020-12-31')
print(df.head())      # First 5 rows (default)
print(df.head(3))     # First 3 rows

2. tail Method¶

print(df.tail())      # Last 5 rows (default)
print(df.tail(3))     # Last 3 rows

3. Quick Preview¶

# Combine for overview
print(df.head(2))
print('...')
print(df.tail(2))

info¶

Display DataFrame summary.

1. Basic Info¶

print(df.info())

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 253 entries, 2020-01-02 to 2020-12-31
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          253 non-null    float64
 1   High          253 non-null    float64
...

2. Information Provided¶

Index type and range
Column count
Column names and dtypes
Non-null counts
Memory usage

3. Memory Details¶

df.info(memory_usage='deep')

describe¶

Generate descriptive statistics.

1. Numeric Columns¶

print(df.describe())

              Open         High          Low        Close
count   253.000000   253.000000   253.000000   253.000000
mean    120.456789   121.234567   119.876543   120.567890
std       8.123456     8.234567     8.012345     8.156789
min     102.345678   103.456789   101.234567   102.456789
25%     114.567890   115.678901   113.456789   114.678901
50%     120.123456   121.234567   119.012345   120.234567
75%     126.789012   127.890123   125.678901   126.890123
max     138.901234   140.012345   137.890123   139.012345

2. Include All Columns¶

print(df.describe(include='all'))

3. Returns DataFrame¶

stats = df.describe()
print(type(stats))  # DataFrame
print(stats.loc['mean', 'Close'])  # Access specific stat

copy¶

Create a deep copy of DataFrame.

1. Deep Copy¶

df_copy = df.copy()

2. Why Copy Matters¶

# Without copy, changes affect original
df_view = df
df_view.iloc[0, 0] = 9999
print(df.iloc[0, 0])  # 9999 - original changed!

# With copy, original is safe
df_copy = df.copy()
df_copy.iloc[0, 0] = 9999
print(df.iloc[0, 0])  # Original unchanged

3. Deep vs Shallow¶

df_deep = df.copy(deep=True)   # Default
df_shallow = df.copy(deep=False)

isna and isnull¶

Check for missing values.

1. Check Missing¶

import numpy as np

df.iloc[1, 1] = np.nan
df.iloc[2, 2] = np.nan

print(df.isna().head(3))

2. isna vs isnull¶

# They are identical
print(df.isna().equals(df.isnull()))  # True

3. Count Missing¶

print(df.isnull().sum())  # Missing per column

iterrows¶

Iterate over rows as (index, Series) pairs.

1. Basic Iteration¶

for date, row in df.iterrows():
    print(date)
    print(row)
    print('-' * 40)
    break  # Just show first

2. Access Values¶

for idx, row in df.iterrows():
    print(f"Date: {idx}, Close: {row['Close']}")

3. Performance Warning¶

# iterrows is slow for large DataFrames
# Prefer vectorized operations when possible

itertuples¶

Faster iteration with named tuples.

1. Basic Usage¶

for row in df.itertuples():
    print(row.Index, row.Close)

2. Faster than iterrows¶

# itertuples is faster than iterrows

3. Access by Name¶

for row in df.itertuples(index=False):
    print(row.Open, row.Close)

sample¶

Random sample of rows.

1. Random Rows¶

print(df.sample(5))  # 5 random rows

2. Reproducible Sample¶

print(df.sample(5, random_state=42))

3. Fraction Sample¶

print(df.sample(frac=0.1))  # 10% of rows

min and max¶

Find minimum and maximum values.

1. Column Min/Max¶

print(df['Close'].min())
print(df['Close'].max())

2. All Columns¶

print(df.min())  # Min of each column
print(df.max())  # Max of each column

3. With Index¶

print(df['Close'].idxmin())  # Index of min
print(df['Close'].idxmax())  # Index of max

count¶

Count non-null values.

1. Per Column¶

print(df.count())

2. Single Column¶

print(df['Close'].count())

3. vs len()¶

print(len(df))       # Total rows
print(df.count())    # Non-null per column