Keyword - axis¶
The axis parameter determines whether to concatenate along rows (vertically) or columns (horizontally).
Mental Model
axis=0 means "grow downward" -- new rows are appended below existing ones. axis=1 means "grow rightward" -- new columns are placed beside existing ones. The axis number tells you which dimension gets longer: 0 for row-count, 1 for column-count.
axis=0 Vertical¶
Stack DataFrames on top of each other.
1. Default Behavior¶
```python import pandas as pd
df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB')) df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
df = pd.concat([df1, df2]) print("axis = 0 (default): append top-down using index") print(df) ```
A B
0 1 2
1 3 4
0 5 6
1 7 8
2. Row Count Increases¶
python
print(len(df)) # 4 (2 + 2)
3. Index Concatenation¶
Row indices from both DataFrames are preserved.
axis=1 Horizontal¶
Stack DataFrames side by side.
1. Horizontal Concatenation¶
python
dg = pd.concat([df1, df2], axis=1)
print("axis = 1: append left-right using columns")
print(dg)
A B A B
0 1 2 5 6
1 3 4 7 8
2. Column Count Increases¶
python
print(len(dg.columns)) # 4 (2 + 2)
3. Index Alignment¶
Rows are aligned by index; unmatched indices get NaN.
Index Alignment Behavior¶
How indices are handled differs by axis.
1. Different Row Indices¶
```python df1 = pd.DataFrame({'A': [1, 2]}, index=[0, 1]) df2 = pd.DataFrame({'B': [3, 4]}, index=[1, 2])
axis=1 aligns on index¶
result = pd.concat([df1, df2], axis=1) print(result) ```
A B
0 1.0 NaN
1 2.0 3.0
2 NaN 4.0
2. Different Columns¶
```python df1 = pd.DataFrame({'A': [1], 'B': [2]}) df2 = pd.DataFrame({'B': [3], 'C': [4]})
axis=0 creates union of columns¶
result = pd.concat([df1, df2]) print(result) ```
A B C
0 1.0 2 NaN
0 NaN 3 4.0
3. Use join Parameter¶
python
pd.concat([df1, df2], axis=0, join='inner') # Only common columns
pd.concat([df1, df2], axis=1, join='inner') # Only common indices
Practical Examples¶
Common concatenation patterns.
1. Stack Multiple Files¶
```python
Load and stack CSV files¶
dfs = [pd.read_csv(f'data_{year}.csv') for year in range(2020, 2024)] combined = pd.concat(dfs, axis=0, ignore_index=True) ```
2. Add Calculated Columns¶
```python original = pd.DataFrame({'A': [1, 2, 3]}) calculated = pd.DataFrame({'B': [10, 20, 30]})
result = pd.concat([original, calculated], axis=1) ```
3. Build Wide DataFrame¶
```python
Combine time series side by side¶
prices = pd.concat([aapl, msft, googl], axis=1) prices.columns = ['AAPL', 'MSFT', 'GOOGL'] ```
Exercises¶
Exercise 1. Write code that concatenates two DataFrames along axis=0 (rows) and then along axis=1 (columns). Print the shape of each result.
Solution to Exercise 1
```python import pandas as pd
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]}) result = pd.concat([df1, df2], ignore_index=True) print(result) ```
Exercise 2. Predict the output shape when concatenating a (3, 2) DataFrame with a (4, 2) DataFrame along axis=0, and along axis=1.
Solution to Exercise 2
See the explanation in the main content. The key concept involves understanding how pd.concat() aligns data along the specified axis and handles mismatched indices or columns.
Exercise 3. Write code that concatenates two DataFrames with different column names along axis=1. Show the resulting column names.
Solution to Exercise 3
```python import pandas as pd
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]}) result = pd.concat([df1, df2], axis=0) print(result) ```
Exercise 4. Create three DataFrames and concatenate them along axis=0 using a list comprehension inside pd.concat().
Solution to Exercise 4
```python import pandas as pd
df1 = pd.DataFrame({'A': [1, 2]}, index=[0, 1]) df2 = pd.DataFrame({'A': [3, 4]}, index=[2, 3]) result = pd.concat([df1, df2]) print(result) ```