Skip to content

DataFrame Attributes

DataFrame attributes provide information about the structure and properties of your data.

columns

Access column labels.

1. Get Columns

import pandas as pd
import yfinance as yf

df = yf.Ticker('WMT').history(start='2020-01-01', end='2020-12-31')
print(df.columns)
Index(['Open', 'High', 'Low', 'Close', 'Volume', 'Dividends', 'Stock Splits'], dtype='object')

2. Access by Position

print(df.columns[0])  # 'Open'
print(type(df.columns[0]))  # <class 'str'>

3. Convert to List

col_list = df.columns.tolist()

index

Access row labels.

1. Get Index

print(df.index)
DatetimeIndex(['2020-01-02', '2020-01-03', ...], dtype='datetime64[ns]', name='Date', freq=None)

2. Access by Position

print(df.index[0])  # Timestamp('2020-01-02 00:00:00')
print(type(df.index[0]))  # <class 'pandas._libs.tslibs.timestamps.Timestamp'>

3. Index Properties

print(df.index.name)  # 'Date'
print(df.index.dtype)  # datetime64[ns]

shape

Get DataFrame dimensions.

1. Basic Shape

url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'
df = pd.read_csv(url)
print(df.shape)  # (891, 12)

2. After Selection

df_subset = df[['Survived', 'Sex']]
print(df_subset.shape)  # (891, 2)

3. DataFrame vs Series

df_col = df[['Survived']]  # DataFrame
print(df_col.shape)  # (891, 1)

series = df['Survived']  # Series
print(series.shape)  # (891,)

values

Get underlying NumPy array.

1. Access Values

x = df.values
print(type(x))  # <class 'numpy.ndarray'>
print(x.shape)  # Same as df.shape

2. Slicing Values

print(x[1:2, 2:3].shape)  # (1, 1)
print(x[1:2, 2].shape)    # (1,)
print(x[1, 2].shape)      # () scalar

3. Prefer to_numpy()

# Modern pandas recommends to_numpy()
arr = df.to_numpy()

dtypes

Get data types of each column.

1. All dtypes

print(df.dtypes)
PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
...

2. DataFrame vs Series

# DataFrame has dtypes (plural)
print(df.dtypes)

# Series has dtype (singular)
print(df['Age'].dtype)  # float64

3. Common Error

# This raises AttributeError
try:
    print(df.dtype)  # Wrong! Use dtypes
except AttributeError as e:
    print(e)

size

Total number of elements.

1. Get Size

print(df.size)  # rows × columns

2. Calculation

# Equivalent to
print(df.shape[0] * df.shape[1])

3. vs len()

print(len(df))  # Number of rows only
print(df.size)  # Total elements

ndim

Number of dimensions.

1. DataFrame ndim

print(df.ndim)  # 2

2. Series ndim

print(df['Age'].ndim)  # 1

3. Use Case

if data.ndim == 1:
    print("Series")
else:
    print("DataFrame")

empty

Check if DataFrame is empty.

1. Check Empty

print(df.empty)  # False

2. Empty DataFrame

empty_df = pd.DataFrame()
print(empty_df.empty)  # True

3. Conditional Logic

if not df.empty:
    process_data(df)

T (Transpose)

Transpose rows and columns.

1. Transpose

df_t = df.T
print(df_t.shape)  # Swapped dimensions

2. Use Case

# Useful for displaying wide DataFrames
print(df.head().T)

3. Method Alternative

df_transposed = df.transpose()