Skip to content

Basic DataFrame Methods

Essential methods for inspecting and exploring DataFrame contents.

head and tail

View first or last rows.

1. head Method

import pandas as pd
import yfinance as yf

df = yf.Ticker('WMT').history(start='2020-01-01', end='2020-12-31')
print(df.head())      # First 5 rows (default)
print(df.head(3))     # First 3 rows

2. tail Method

print(df.tail())      # Last 5 rows (default)
print(df.tail(3))     # Last 3 rows

3. Quick Preview

# Combine for overview
print(df.head(2))
print('...')
print(df.tail(2))

info

Display DataFrame summary.

1. Basic Info

print(df.info())
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 253 entries, 2020-01-02 to 2020-12-31
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          253 non-null    float64
 1   High          253 non-null    float64
...

2. Information Provided

  • Index type and range
  • Column count
  • Column names and dtypes
  • Non-null counts
  • Memory usage

3. Memory Details

df.info(memory_usage='deep')

describe

Generate descriptive statistics.

1. Numeric Columns

print(df.describe())
              Open         High          Low        Close
count   253.000000   253.000000   253.000000   253.000000
mean    120.456789   121.234567   119.876543   120.567890
std       8.123456     8.234567     8.012345     8.156789
min     102.345678   103.456789   101.234567   102.456789
25%     114.567890   115.678901   113.456789   114.678901
50%     120.123456   121.234567   119.012345   120.234567
75%     126.789012   127.890123   125.678901   126.890123
max     138.901234   140.012345   137.890123   139.012345

2. Include All Columns

print(df.describe(include='all'))

3. Returns DataFrame

stats = df.describe()
print(type(stats))  # DataFrame
print(stats.loc['mean', 'Close'])  # Access specific stat

copy

Create a deep copy of DataFrame.

1. Deep Copy

df_copy = df.copy()

2. Why Copy Matters

# Without copy, changes affect original
df_view = df
df_view.iloc[0, 0] = 9999
print(df.iloc[0, 0])  # 9999 - original changed!

# With copy, original is safe
df_copy = df.copy()
df_copy.iloc[0, 0] = 9999
print(df.iloc[0, 0])  # Original unchanged

3. Deep vs Shallow

df_deep = df.copy(deep=True)   # Default
df_shallow = df.copy(deep=False)

isna and isnull

Check for missing values.

1. Check Missing

import numpy as np

df.iloc[1, 1] = np.nan
df.iloc[2, 2] = np.nan

print(df.isna().head(3))

2. isna vs isnull

# They are identical
print(df.isna().equals(df.isnull()))  # True

3. Count Missing

print(df.isnull().sum())  # Missing per column

iterrows

Iterate over rows as (index, Series) pairs.

1. Basic Iteration

for date, row in df.iterrows():
    print(date)
    print(row)
    print('-' * 40)
    break  # Just show first

2. Access Values

for idx, row in df.iterrows():
    print(f"Date: {idx}, Close: {row['Close']}")

3. Performance Warning

# iterrows is slow for large DataFrames
# Prefer vectorized operations when possible

itertuples

Faster iteration with named tuples.

1. Basic Usage

for row in df.itertuples():
    print(row.Index, row.Close)

2. Faster than iterrows

# itertuples is faster than iterrows

3. Access by Name

for row in df.itertuples(index=False):
    print(row.Open, row.Close)

sample

Random sample of rows.

1. Random Rows

print(df.sample(5))  # 5 random rows

2. Reproducible Sample

print(df.sample(5, random_state=42))

3. Fraction Sample

print(df.sample(frac=0.1))  # 10% of rows

min and max

Find minimum and maximum values.

1. Column Min/Max

print(df['Close'].min())
print(df['Close'].max())

2. All Columns

print(df.min())  # Min of each column
print(df.max())  # Max of each column

3. With Index

print(df['Close'].idxmin())  # Index of min
print(df['Close'].idxmax())  # Index of max

count

Count non-null values.

1. Per Column

print(df.count())

2. Single Column

print(df['Close'].count())

3. vs len()

print(len(df))       # Total rows
print(df.count())    # Non-null per column