concat vs append vs merge¶

Pandas provides multiple functions for combining DataFrames, and choosing the wrong one can produce incorrect results or unnecessary performance overhead. The three main options — pd.concat, DataFrame.append, and pd.merge — serve different purposes. This page clarifies when to use each and how they differ in behavior, performance, and API design.

import pandas as pd

Quick Comparison¶

Feature	`pd.concat`	`DataFrame.append`	`pd.merge`
Purpose	Stack along an axis	Shortcut for vertical concat	Join on columns/index
Axis	Row or column	Row only	N/A (column-based join)
Input	List of DataFrames	Single DataFrame or dict	Two DataFrames
Join type	Outer/inner (on the other axis)	Outer/inner	Left/right/outer/inner
Key columns	No (uses index alignment)	No	Yes (`on`, `left_on`, `right_on`)
Status	Active	Deprecated since pandas 1.4	Active

pd.concat — Stacking DataFrames¶

pd.concat stacks multiple DataFrames along a specified axis. It aligns on the other axis using index or column labels.

df1 = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
df2 = pd.DataFrame({"A": [5, 6], "B": [7, 8]})

# Vertical stacking (axis=0, the default)
result = pd.concat([df1, df2], ignore_index=True)
print(result)

Use pd.concat when you need to stack two or more DataFrames that share the same columns (vertical) or the same index (horizontal).

DataFrame.append — Deprecated Shortcut¶

DataFrame.append was a convenience wrapper around pd.concat for vertical stacking of a single DataFrame. It was deprecated in pandas 1.4 and removed in pandas 2.0.

# Before pandas 2.0 (deprecated)
# result = df1.append(df2, ignore_index=True)

# Equivalent using pd.concat (recommended)
result = pd.concat([df1, df2], ignore_index=True)

append Is Removed in pandas 2.0

If you encounter DataFrame.append in existing code, replace it with pd.concat. The behavior is identical for single-DataFrame appends: df1.append(df2) is equivalent to pd.concat([df1, df2]).

Why append Was Deprecated¶

Each call to append created a new DataFrame by copying all data. In a loop, this leads to quadratic time complexity because each iteration copies all previously appended rows.

# Slow: O(n²) due to repeated copying
result = pd.DataFrame()
for chunk in data_chunks:
    result = pd.concat([result, chunk])  # still copies each time

# Fast: O(n) — collect first, concat once
result = pd.concat(data_chunks, ignore_index=True)

pd.merge — Joining on Keys¶

pd.merge performs database-style joins on column values or index labels. Unlike pd.concat, it matches rows based on shared key values rather than stacking by position.

orders = pd.DataFrame({
    "order_id": [1, 2, 3],
    "product_id": [101, 102, 103]
})

products = pd.DataFrame({
    "product_id": [101, 102, 104],
    "name": ["Widget", "Gadget", "Doohickey"]
})

# Inner join on product_id
result = pd.merge(orders, products, on="product_id")
print(result)

   order_id  product_id    name
0         1         101  Widget
1         2         102  Gadget

Use pd.merge when you need to combine DataFrames based on matching values in one or more key columns, similar to SQL JOIN operations.

Side-by-Side Example¶

The following example uses the same two DataFrames to show how concat and merge produce different results.

df_left = pd.DataFrame({
    "key": ["a", "b", "c"],
    "value_left": [1, 2, 3]
})

df_right = pd.DataFrame({
    "key": ["b", "c", "d"],
    "value_right": [4, 5, 6]
})

concat (vertical stack)¶

stacked = pd.concat([df_left, df_right], ignore_index=True)
print(stacked)

  key  value_left  value_right
0   a         1.0          NaN
1   b         2.0          NaN
2   c         3.0          NaN
3   b         NaN          4.0
4   c         NaN          5.0
5   d         NaN          6.0

Concat stacks the rows without matching on the key column. Missing columns are filled with NaN.

merge (join on key)¶

joined = pd.merge(df_left, df_right, on="key")
print(joined)

  key  value_left  value_right
0   b           2            4
1   c           3            5

Merge matches rows where key values agree, discarding non-matching rows (inner join by default).

Decision Guide¶

Use the following rules to choose between the three operations.

Stacking rows with the same columns — use pd.concat with axis=0
Stacking columns with the same index — use pd.concat with axis=1
Joining on shared key columns — use pd.merge
DataFrame.append — do not use; replace with pd.concat

Performance Rule

When combining many DataFrames in a loop, collect them in a list first and call pd.concat once at the end. Repeated concatenation inside a loop copies all previous data on every iteration.

Summary¶

Operation	When to Use
`pd.concat([df1, df2])`	Stack DataFrames vertically or horizontally by position
`pd.merge(df1, df2, on=...)`	Join DataFrames on matching key column values
`df1.append(df2)`	Never — deprecated and removed; use `pd.concat` instead

Key Takeaways:

pd.concat stacks by position along an axis; pd.merge matches rows by key values
DataFrame.append was removed in pandas 2.0 — always use pd.concat instead
Collect DataFrames in a list and concat once to avoid quadratic copying overhead
Choose merge for database-style joins and concat for simple stacking operations