Skip to content

DataFrame Creation

DataFrames can be created from various data structures including dictionaries, lists, NumPy arrays, and other DataFrames.

From Dictionary of Lists

Column-oriented data with lists as values.

1. Basic Dictionary

import pandas as pd

data = {
    'temperature': [32, 35, 28],
    'windspeed': [6, 7, 2],
    'event': ['Rain', 'Sunny', 'Snow']
}

df = pd.DataFrame(data)
print(df)
   temperature  windspeed  event
0           32          6   Rain
1           35          7  Sunny
2           28          2   Snow

2. With Custom Index

day = ['1/1/2017', '1/2/2017', '1/3/2017']
df = pd.DataFrame(data, index=day)
print(df)
          temperature  windspeed  event
1/1/2017           32          6   Rain
1/2/2017           35          7  Sunny
1/3/2017           28          2   Snow

3. Access Attributes

print(df.index)    # Index(['1/1/2017', '1/2/2017', '1/3/2017'])
print(df.columns)  # Index(['temperature', 'windspeed', 'event'])

From Dictionary of Dictionaries

Row keys become the index automatically.

1. Nested Dictionaries

temp  = {'1/1/2017': 32, '1/2/2017': 35, '1/3/2017': 28}
wind  = {'1/1/2017': 6, '1/2/2017': 7, '1/3/2017': 2}
event = {'1/1/2017': 'Rain', '1/2/2017': 'Sunny', '1/3/2017': 'Snow'}

data = {'temperature': temp, 'windspeed': wind, 'event': event}
df = pd.DataFrame(data)
print(df)

2. Automatic Index

Keys from inner dictionaries become the DataFrame index.

3. Handling Missing Keys

# If inner dicts have different keys, NaN fills missing values

From List of Lists

Row-oriented data with each inner list as a row.

1. Basic List of Lists

data = [
    ['1/1/2017', 32, 6, 'Rain'],
    ['1/2/2017', 35, 7, 'Sunny'],
    ['1/3/2017', 28, 2, 'Snow']
]

columns = ['day', 'temperature', 'windspeed', 'event']
df = pd.DataFrame(data, columns=columns)
print(df)

2. Set Column as Index

df = df.set_index('day')

3. Direct Index Assignment

df = pd.DataFrame(data, columns=columns).set_index('day')

From List of Dictionaries

Each dictionary represents a row.

1. Row Dictionaries

data = [
    {'day': '1/1/2017', 'temperature': 32, 'windspeed': 6, 'event': 'Rain'},
    {'day': '1/2/2017', 'temperature': 35, 'windspeed': 7, 'event': 'Sunny'},
    {'day': '1/3/2017', 'temperature': 28, 'windspeed': 2, 'event': 'Snow'}
]

df = pd.DataFrame(data).set_index('day')

2. Automatic Column Detection

Column names are inferred from dictionary keys.

3. Missing Keys

# Missing keys in some dicts result in NaN values

From NumPy Array

Create DataFrame from 2D array.

1. Random Data

import numpy as np

np.random.seed(0)
data = np.random.normal(size=(3, 4))

index = ['Jenny', 'Frank', 'Wenfei']
columns = list('ABCD')

df = pd.DataFrame(data, index=index, columns=columns)
print(df)
               A         B         C         D
Jenny   1.764052  0.400157  0.978738  2.240893
Frank   1.867558 -0.977278  0.950088 -0.151357
Wenfei -0.103219  0.410599  0.144044  1.454274

2. Specify dtype

df = pd.DataFrame(data, dtype=float)

3. Shape Preservation

DataFrame shape matches array shape.

LeetCode Example

Create DataFrame from list of student data.

1. Sample Data

student_data = [
    [101, 20],
    [102, 22],
    [103, 21]
]

2. Create DataFrame

df = pd.DataFrame(student_data, columns=['student_id', 'age'])
print(df)
   student_id  age
0         101   20
1         102   22
2         103   21

3. Type Annotation

from typing import List

def createDataframe(student_data: List[List[int]]) -> pd.DataFrame:
    return pd.DataFrame(student_data, columns=['student_id', 'age'])