DataFrame Attributes¶

DataFrame attributes provide information about the structure and properties of your data.

Mental Model

Attributes are metadata lookups, not computations. .shape gives dimensions, .dtypes gives column types, .columns and .index give axis labels. They are your first stop after loading data -- a quick structural X-ray before doing any analysis.

columns¶

Access column labels.

1. Get Columns¶

```python import pandas as pd import yfinance as yf

df = yf.Ticker('WMT').history(start='2020-01-01', end='2020-12-31') print(df.columns) ```

Index(['Open', 'High', 'Low', 'Close', 'Volume', 'Dividends', 'Stock Splits'], dtype='object')

2. Access by Position¶

python print(df.columns[0]) # 'Open' print(type(df.columns[0])) # <class 'str'>

3. Convert to List¶

python col_list = df.columns.tolist()

index¶

Access row labels.

1. Get Index¶

python print(df.index)

DatetimeIndex(['2020-01-02', '2020-01-03', ...], dtype='datetime64[ns]', name='Date', freq=None)

2. Access by Position¶

python print(df.index[0]) # Timestamp('2020-01-02 00:00:00') print(type(df.index[0])) # <class 'pandas._libs.tslibs.timestamps.Timestamp'>

3. Index Properties¶

python print(df.index.name) # 'Date' print(df.index.dtype) # datetime64[ns]

shape¶

Get DataFrame dimensions.

1. Basic Shape¶

python url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv' df = pd.read_csv(url) print(df.shape) # (891, 12)

2. After Selection¶

python df_subset = df[['Survived', 'Sex']] print(df_subset.shape) # (891, 2)

3. DataFrame vs Series¶

```python df_col = df[['Survived']] # DataFrame print(df_col.shape) # (891, 1)

series = df['Survived'] # Series print(series.shape) # (891,) ```

values¶

Get underlying NumPy array.

1. Access Values¶

python x = df.values print(type(x)) # <class 'numpy.ndarray'> print(x.shape) # Same as df.shape

2. Slicing Values¶

python print(x[1:2, 2:3].shape) # (1, 1) print(x[1:2, 2].shape) # (1,) print(x[1, 2].shape) # () scalar

3. Prefer to_numpy()¶

```python

Modern pandas recommends to_numpy()¶

arr = df.to_numpy() ```

dtypes¶

Get data types of each column.

1. All dtypes¶

python print(df.dtypes)

PassengerId int64 Survived int64 Pclass int64 Name object Sex object Age float64 ...

2. DataFrame vs Series¶

```python

DataFrame has dtypes (plural)¶

print(df.dtypes)

Series has dtype (singular)¶

print(df['Age'].dtype) # float64 ```

3. Common Error¶

```python

This raises AttributeError¶

try: print(df.dtype) # Wrong! Use dtypes except AttributeError as e: print(e) ```

size¶

Total number of elements.

1. Get Size¶

python print(df.size) # rows × columns

2. Calculation¶

```python

Equivalent to¶

print(df.shape[0] * df.shape[1]) ```

3. vs len()¶

python print(len(df)) # Number of rows only print(df.size) # Total elements

ndim¶

Number of dimensions.

1. DataFrame ndim¶

python print(df.ndim) # 2

2. Series ndim¶

python print(df['Age'].ndim) # 1

3. Use Case¶

python if data.ndim == 1: print("Series") else: print("DataFrame")

empty¶

Check if DataFrame is empty.

1. Check Empty¶

python print(df.empty) # False

2. Empty DataFrame¶

python empty_df = pd.DataFrame() print(empty_df.empty) # True

3. Conditional Logic¶

python if not df.empty: process_data(df)

T (Transpose)¶

Transpose rows and columns.

1. Transpose¶

python df_t = df.T print(df_t.shape) # Swapped dimensions

2. Use Case¶

```python

Useful for displaying wide DataFrames¶

print(df.head().T) ```

3. Method Alternative¶

python df_transposed = df.transpose()

Exercises¶

Exercise 1. Create a DataFrame with columns 'name', 'age', and 'salary' (5 rows). Use .shape, .columns, and .dtypes to print the number of rows, the column names as a list, and the data type of each column.

Solution to Exercise 1

Use the attributes directly on the DataFrame.

import pandas as pd

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve'],
    'age': [25, 30, 35, 40, 45],
    'salary': [50000, 60000, 70000, 80000, 90000]
})
print("Shape:", df.shape)
print("Columns:", df.columns.tolist())
print("Dtypes:\n", df.dtypes)

Exercise 2. Given a DataFrame df, use .size, len(df), and .ndim to show the difference between total elements, number of rows, and number of dimensions. Verify that df.size == df.shape[0] * df.shape[1].

Solution to Exercise 2

Compare size, len, and ndim.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])
print("size:", df.size)            # 40
print("len:", len(df))             # 10
print("ndim:", df.ndim)            # 2
assert df.size == df.shape[0] * df.shape[1]
print("size == rows * cols: True")

Exercise 3. Create a DataFrame from a dictionary, then use .values (or .to_numpy()) to extract the underlying NumPy array. Use .T to transpose the DataFrame and print the transposed shape. Confirm the transposed shape is the reverse of the original shape.

Solution to Exercise 3

Extract the NumPy array and transpose the DataFrame.

import pandas as pd

df = pd.DataFrame({
    'x': [1, 2, 3],
    'y': [4, 5, 6]
})
arr = df.to_numpy()
print("Array:\n", arr)
print("Original shape:", df.shape)
print("Transposed shape:", df.T.shape)
assert df.shape == df.T.shape[::-1]
print("Transposed shape is reverse of original: True")