Basic DataFrame Methods¶
Essential methods for inspecting and exploring DataFrame contents.
Mental Model
Inspection methods are your data microscope. head() and tail() show the edges, info() gives a structural summary with dtypes and null counts, and describe() computes column-wise statistics. Run these three right after loading any new dataset to build a mental picture before writing analysis code.
head and tail¶
View first or last rows.
1. head Method¶
```python import pandas as pd import yfinance as yf
df = yf.Ticker('WMT').history(start='2020-01-01', end='2020-12-31') print(df.head()) # First 5 rows (default) print(df.head(3)) # First 3 rows ```
2. tail Method¶
python
print(df.tail()) # Last 5 rows (default)
print(df.tail(3)) # Last 3 rows
3. Quick Preview¶
```python
Combine for overview¶
print(df.head(2)) print('...') print(df.tail(2)) ```
info¶
Display DataFrame summary.
1. Basic Info¶
python
print(df.info())
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 253 entries, 2020-01-02 to 2020-12-31
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Open 253 non-null float64
1 High 253 non-null float64
...
2. Information Provided¶
- Index type and range
- Column count
- Column names and dtypes
- Non-null counts
- Memory usage
3. Memory Details¶
python
df.info(memory_usage='deep')
describe¶
Generate descriptive statistics.
1. Numeric Columns¶
python
print(df.describe())
Open High Low Close
count 253.000000 253.000000 253.000000 253.000000
mean 120.456789 121.234567 119.876543 120.567890
std 8.123456 8.234567 8.012345 8.156789
min 102.345678 103.456789 101.234567 102.456789
25% 114.567890 115.678901 113.456789 114.678901
50% 120.123456 121.234567 119.012345 120.234567
75% 126.789012 127.890123 125.678901 126.890123
max 138.901234 140.012345 137.890123 139.012345
2. Include All Columns¶
python
print(df.describe(include='all'))
3. Returns DataFrame¶
python
stats = df.describe()
print(type(stats)) # DataFrame
print(stats.loc['mean', 'Close']) # Access specific stat
copy¶
Create a deep copy of DataFrame.
1. Deep Copy¶
python
df_copy = df.copy()
2. Why Copy Matters¶
```python
Without copy, changes affect original¶
df_view = df df_view.iloc[0, 0] = 9999 print(df.iloc[0, 0]) # 9999 - original changed!
With copy, original is safe¶
df_copy = df.copy() df_copy.iloc[0, 0] = 9999 print(df.iloc[0, 0]) # Original unchanged ```
3. Deep vs Shallow¶
python
df_deep = df.copy(deep=True) # Default
df_shallow = df.copy(deep=False)
isna and isnull¶
Check for missing values.
1. Check Missing¶
```python import numpy as np
df.iloc[1, 1] = np.nan df.iloc[2, 2] = np.nan
print(df.isna().head(3)) ```
2. isna vs isnull¶
```python
They are identical¶
print(df.isna().equals(df.isnull())) # True ```
3. Count Missing¶
python
print(df.isnull().sum()) # Missing per column
iterrows¶
Iterate over rows as (index, Series) pairs.
1. Basic Iteration¶
python
for date, row in df.iterrows():
print(date)
print(row)
print('-' * 40)
break # Just show first
2. Access Values¶
python
for idx, row in df.iterrows():
print(f"Date: {idx}, Close: {row['Close']}")
3. Performance Warning¶
```python
iterrows is slow for large DataFrames¶
Prefer vectorized operations when possible¶
```
itertuples¶
Faster iteration with named tuples.
1. Basic Usage¶
python
for row in df.itertuples():
print(row.Index, row.Close)
2. Faster than iterrows¶
```python
itertuples is faster than iterrows¶
```
3. Access by Name¶
python
for row in df.itertuples(index=False):
print(row.Open, row.Close)
sample¶
Random sample of rows.
1. Random Rows¶
python
print(df.sample(5)) # 5 random rows
2. Reproducible Sample¶
python
print(df.sample(5, random_state=42))
3. Fraction Sample¶
python
print(df.sample(frac=0.1)) # 10% of rows
min and max¶
Find minimum and maximum values.
1. Column Min/Max¶
python
print(df['Close'].min())
print(df['Close'].max())
2. All Columns¶
python
print(df.min()) # Min of each column
print(df.max()) # Max of each column
3. With Index¶
python
print(df['Close'].idxmin()) # Index of min
print(df['Close'].idxmax()) # Index of max
count¶
Count non-null values.
1. Per Column¶
python
print(df.count())
2. Single Column¶
python
print(df['Close'].count())
3. vs len()¶
python
print(len(df)) # Total rows
print(df.count()) # Non-null per column
Exercises¶
Exercise 1.
Create a DataFrame with 100 rows of random data. Use .head(10) and .tail(10) to view the first and last 10 rows. Then use .sample(5, random_state=0) to get a reproducible random sample.
Solution to Exercise 1
Use head, tail, and sample for quick inspection.
import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame(np.random.randn(100, 3), columns=['A', 'B', 'C'])
print(df.head(10))
print(df.tail(10))
print(df.sample(5, random_state=0))
Exercise 2.
Create a DataFrame with some NaN values. Use .info() to inspect the non-null counts per column, then use .isna().sum() to confirm the count of missing values in each column.
Solution to Exercise 2
Use info and isna to inspect missing data.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'name': ['Alice', 'Bob', None, 'Dave'],
'age': [25, np.nan, 35, np.nan],
'score': [90, 85, 88, 92]
})
df.info()
print("\nMissing values per column:")
print(df.isna().sum())
Exercise 3.
Create a DataFrame with a 'price' column. Use .min(), .max(), .idxmin(), and .idxmax() to find the minimum and maximum prices along with their row indices. Verify that df.loc[df['price'].idxmax(), 'price'] == df['price'].max().
Solution to Exercise 3
Find min/max values and their indices.
import pandas as pd
df = pd.DataFrame({
'product': ['A', 'B', 'C', 'D', 'E'],
'price': [19.99, 5.49, 42.00, 12.75, 35.50]
})
print("Min price:", df['price'].min(), "at index", df['price'].idxmin())
print("Max price:", df['price'].max(), "at index", df['price'].idxmax())
assert df.loc[df['price'].idxmax(), 'price'] == df['price'].max()
print("Verification passed.")