Skip to content

Basic DataFrame Methods

Essential methods for inspecting and exploring DataFrame contents.

Mental Model

Inspection methods are your data microscope. head() and tail() show the edges, info() gives a structural summary with dtypes and null counts, and describe() computes column-wise statistics. Run these three right after loading any new dataset to build a mental picture before writing analysis code.

head and tail

View first or last rows.

1. head Method

```python import pandas as pd import yfinance as yf

df = yf.Ticker('WMT').history(start='2020-01-01', end='2020-12-31') print(df.head()) # First 5 rows (default) print(df.head(3)) # First 3 rows ```

2. tail Method

python print(df.tail()) # Last 5 rows (default) print(df.tail(3)) # Last 3 rows

3. Quick Preview

```python

Combine for overview

print(df.head(2)) print('...') print(df.tail(2)) ```

info

Display DataFrame summary.

1. Basic Info

python print(df.info())

<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 253 entries, 2020-01-02 to 2020-12-31 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Open 253 non-null float64 1 High 253 non-null float64 ...

2. Information Provided

  • Index type and range
  • Column count
  • Column names and dtypes
  • Non-null counts
  • Memory usage

3. Memory Details

python df.info(memory_usage='deep')

describe

Generate descriptive statistics.

1. Numeric Columns

python print(df.describe())

Open High Low Close count 253.000000 253.000000 253.000000 253.000000 mean 120.456789 121.234567 119.876543 120.567890 std 8.123456 8.234567 8.012345 8.156789 min 102.345678 103.456789 101.234567 102.456789 25% 114.567890 115.678901 113.456789 114.678901 50% 120.123456 121.234567 119.012345 120.234567 75% 126.789012 127.890123 125.678901 126.890123 max 138.901234 140.012345 137.890123 139.012345

2. Include All Columns

python print(df.describe(include='all'))

3. Returns DataFrame

python stats = df.describe() print(type(stats)) # DataFrame print(stats.loc['mean', 'Close']) # Access specific stat

copy

Create a deep copy of DataFrame.

1. Deep Copy

python df_copy = df.copy()

2. Why Copy Matters

```python

Without copy, changes affect original

df_view = df df_view.iloc[0, 0] = 9999 print(df.iloc[0, 0]) # 9999 - original changed!

With copy, original is safe

df_copy = df.copy() df_copy.iloc[0, 0] = 9999 print(df.iloc[0, 0]) # Original unchanged ```

3. Deep vs Shallow

python df_deep = df.copy(deep=True) # Default df_shallow = df.copy(deep=False)

isna and isnull

Check for missing values.

1. Check Missing

```python import numpy as np

df.iloc[1, 1] = np.nan df.iloc[2, 2] = np.nan

print(df.isna().head(3)) ```

2. isna vs isnull

```python

They are identical

print(df.isna().equals(df.isnull())) # True ```

3. Count Missing

python print(df.isnull().sum()) # Missing per column

iterrows

Iterate over rows as (index, Series) pairs.

1. Basic Iteration

python for date, row in df.iterrows(): print(date) print(row) print('-' * 40) break # Just show first

2. Access Values

python for idx, row in df.iterrows(): print(f"Date: {idx}, Close: {row['Close']}")

3. Performance Warning

```python

iterrows is slow for large DataFrames

Prefer vectorized operations when possible

```

itertuples

Faster iteration with named tuples.

1. Basic Usage

python for row in df.itertuples(): print(row.Index, row.Close)

2. Faster than iterrows

```python

itertuples is faster than iterrows

```

3. Access by Name

python for row in df.itertuples(index=False): print(row.Open, row.Close)

sample

Random sample of rows.

1. Random Rows

python print(df.sample(5)) # 5 random rows

2. Reproducible Sample

python print(df.sample(5, random_state=42))

3. Fraction Sample

python print(df.sample(frac=0.1)) # 10% of rows

min and max

Find minimum and maximum values.

1. Column Min/Max

python print(df['Close'].min()) print(df['Close'].max())

2. All Columns

python print(df.min()) # Min of each column print(df.max()) # Max of each column

3. With Index

python print(df['Close'].idxmin()) # Index of min print(df['Close'].idxmax()) # Index of max

count

Count non-null values.

1. Per Column

python print(df.count())

2. Single Column

python print(df['Close'].count())

3. vs len()

python print(len(df)) # Total rows print(df.count()) # Non-null per column


Exercises

Exercise 1. Create a DataFrame with 100 rows of random data. Use .head(10) and .tail(10) to view the first and last 10 rows. Then use .sample(5, random_state=0) to get a reproducible random sample.

Solution to Exercise 1

Use head, tail, and sample for quick inspection.

import pandas as pd
import numpy as np

np.random.seed(42)
df = pd.DataFrame(np.random.randn(100, 3), columns=['A', 'B', 'C'])
print(df.head(10))
print(df.tail(10))
print(df.sample(5, random_state=0))

Exercise 2. Create a DataFrame with some NaN values. Use .info() to inspect the non-null counts per column, then use .isna().sum() to confirm the count of missing values in each column.

Solution to Exercise 2

Use info and isna to inspect missing data.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name': ['Alice', 'Bob', None, 'Dave'],
    'age': [25, np.nan, 35, np.nan],
    'score': [90, 85, 88, 92]
})
df.info()
print("\nMissing values per column:")
print(df.isna().sum())

Exercise 3. Create a DataFrame with a 'price' column. Use .min(), .max(), .idxmin(), and .idxmax() to find the minimum and maximum prices along with their row indices. Verify that df.loc[df['price'].idxmax(), 'price'] == df['price'].max().

Solution to Exercise 3

Find min/max values and their indices.

import pandas as pd

df = pd.DataFrame({
    'product': ['A', 'B', 'C', 'D', 'E'],
    'price': [19.99, 5.49, 42.00, 12.75, 35.50]
})
print("Min price:", df['price'].min(), "at index", df['price'].idxmin())
print("Max price:", df['price'].max(), "at index", df['price'].idxmax())
assert df.loc[df['price'].idxmax(), 'price'] == df['price'].max()
print("Verification passed.")