Skip to content

concat Method

The concat() function concatenates pandas objects along a particular axis, stacking DataFrames vertically or horizontally.

Basic Usage

Concatenate DataFrames vertically.

1. Vertical Concatenation

import pandas as pd

df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
print(df1)

df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
print(df2)

df = pd.concat([df1, df2])
print(df)
   A  B
0  1  2
1  3  4
0  5  6
1  7  8

2. List of DataFrames

pd.concat([df1, df2, df3, df4])

3. Index Preserved

Original indices are kept (may have duplicates).

LeetCode Example: Friend Requests

Concatenate two columns into one Series.

1. Sample Data

request_accepted = pd.DataFrame({
    'requester_id': [1, 2, 3, 4, 1, 2],
    'accepter_id': [2, 3, 4, 1, 3, 4]
})

2. Concatenate Columns

combined_ids = pd.concat([
    request_accepted['requester_id'],
    request_accepted['accepter_id']
])
print(combined_ids)
0    1
1    2
2    3
3    4
4    1
5    2
0    2
1    3
2    4
3    1
4    3
5    4
dtype: int64

3. Count Occurrences

friend_counts = combined_ids.value_counts()

Horizontal Concatenation

Stack DataFrames side by side.

1. axis=1

df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('CD'))

dg = pd.concat([df1, df2], axis=1)
print(dg)
   A  B  C  D
0  1  2  5  6
1  3  4  7  8

2. Index Alignment

# When indices differ, NaN fills missing values

3. Use join Parameter

pd.concat([df1, df2], axis=1, join='inner')  # Only matching indices

Handling Mismatched Columns

Concat with different column structures.

1. Union of Columns

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]})

result = pd.concat([df1, df2])
# Columns: A, B, C with NaN where missing

2. join='inner'

result = pd.concat([df1, df2], join='inner')
# Only column B (common to both)

3. Specify Columns

# Ensure same columns before concat
df2 = df2.reindex(columns=df1.columns)

keys Parameter

Add hierarchical index to identify sources.

1. Label Sources

result = pd.concat([df1, df2], keys=['first', 'second'])

2. MultiIndex Result

print(result.index)
# MultiIndex([('first', 0), ('first', 1), ...])

3. Access by Key

result.loc['first']  # Get first DataFrame's rows