DataFrame Architecture¶
Columnar Design¶
1. Structure¶
DataFrame is a dict-like container of Series:
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4.0, 5.0, 6.0],
'C': ['x', 'y', 'z']
})
2. Columns as Series¶
col_a = df['A']
print(type(col_a)) # Series
print(col_a.dtype) # int64
col_c = df['C']
print(col_c.dtype) # object
3. Heterogeneous Types¶
Each column has its own dtype:
print(df.dtypes)
# A int64
# B float64
# C object
Construction¶
1. From Dict¶
data = {
'name': ['Alice', 'Bob'],
'age': [25, 30],
'salary': [50000, 60000]
}
df = pd.DataFrame(data)
2. From Lists¶
data = [
[1, 4, 'x'],
[2, 5, 'y'],
[3, 6, 'z']
]
df = pd.DataFrame(data, columns=['A', 'B', 'C'])
3. From Records¶
df = pd.DataFrame([
{'name': 'Alice', 'age': 25},
{'name': 'Bob', 'age': 30}
])
Indexing¶
1. Column Selection¶
df['A'] # Single column (Series)
df[['A', 'B']] # Multiple columns (DataFrame)
2. Row Selection¶
df.loc[0] # By label
df.iloc[0] # By position
df.loc[0:2] # Slice by label
3. Boolean Indexing¶
df[df['age'] > 25] # Filter rows
Operations¶
1. Column-wise¶
df['D'] = df['A'] + df['B'] # New column
df.drop('D', axis=1, inplace=True) # Remove column
2. Row-wise¶
df.loc[3] = [4, 7, 'w'] # Add row
df = df.drop(3) # Remove row
3. Aggregation¶
df.sum() # Sum each column
df.mean() # Mean each column
df.describe() # Summary statistics