concat vs append vs merge¶

Pandas provides multiple functions for combining DataFrames, and choosing the wrong one can produce incorrect results or unnecessary performance overhead. The three main options — pd.concat, DataFrame.append, and pd.merge — serve different purposes. This page clarifies when to use each and how they differ in behavior, performance, and API design.

Mental Model

concat is glue -- it stacks DataFrames by position along an axis. merge is a database join -- it matches rows by key values. append was just a thin shortcut for vertical concat and is now deprecated. Ask yourself: "Am I stacking or matching?" Stacking means concat; matching means merge.

python import pandas as pd

Quick Comparison¶

Feature	`pd.concat`	`DataFrame.append`	`pd.merge`
Purpose	Stack along an axis	Shortcut for vertical concat	Join on columns/index
Axis	Row or column	Row only	N/A (column-based join)
Input	List of DataFrames	Single DataFrame or dict	Two DataFrames
Join type	Outer/inner (on the other axis)	Outer/inner	Left/right/outer/inner
Key columns	No (uses index alignment)	No	Yes (`on`, `left_on`, `right_on`)
Status	Active	Deprecated since pandas 1.4	Active

pd.concat — Stacking DataFrames¶

pd.concat stacks multiple DataFrames along a specified axis. It aligns on the other axis using index or column labels.

```python df1 = pd.DataFrame({"A": [1, 2], "B": [3, 4]}) df2 = pd.DataFrame({"A": [5, 6], "B": [7, 8]})

Vertical stacking (axis=0, the default)¶

result = pd.concat([df1, df2], ignore_index=True) print(result) ```

A B 0 1 3 1 2 4 2 5 7 3 6 8

Use pd.concat when you need to stack two or more DataFrames that share the same columns (vertical) or the same index (horizontal).

DataFrame.append — Deprecated Shortcut¶

DataFrame.append was a convenience wrapper around pd.concat for vertical stacking of a single DataFrame. It was deprecated in pandas 1.4 and removed in pandas 2.0.

```python

Before pandas 2.0 (deprecated)¶

result = df1.append(df2, ignore_index=True)¶

Equivalent using pd.concat (recommended)¶

result = pd.concat([df1, df2], ignore_index=True) ```

append Is Removed in pandas 2.0

If you encounter DataFrame.append in existing code, replace it with pd.concat. The behavior is identical for single-DataFrame appends: df1.append(df2) is equivalent to pd.concat([df1, df2]).

Why append Was Deprecated¶

Each call to append created a new DataFrame by copying all data. In a loop, this leads to quadratic time complexity because each iteration copies all previously appended rows.

```python

Slow: O(n²) due to repeated copying¶

result = pd.DataFrame() for chunk in data_chunks: result = pd.concat([result, chunk]) # still copies each time

Fast: O(n) — collect first, concat once¶

result = pd.concat(data_chunks, ignore_index=True) ```

pd.merge — Joining on Keys¶

pd.merge performs database-style joins on column values or index labels. Unlike pd.concat, it matches rows based on shared key values rather than stacking by position.

```python orders = pd.DataFrame({ "order_id": [1, 2, 3], "product_id": [101, 102, 103] })

products = pd.DataFrame({ "product_id": [101, 102, 104], "name": ["Widget", "Gadget", "Doohickey"] })

Inner join on product_id¶

result = pd.merge(orders, products, on="product_id") print(result) ```

order_id product_id name 0 1 101 Widget 1 2 102 Gadget

Use pd.merge when you need to combine DataFrames based on matching values in one or more key columns, similar to SQL JOIN operations.

Side-by-Side Example¶

The following example uses the same two DataFrames to show how concat and merge produce different results.

```python df_left = pd.DataFrame({ "key": ["a", "b", "c"], "value_left": [1, 2, 3] })

df_right = pd.DataFrame({ "key": ["b", "c", "d"], "value_right": [4, 5, 6] }) ```

concat (vertical stack)¶

python stacked = pd.concat([df_left, df_right], ignore_index=True) print(stacked)

key value_left value_right 0 a 1.0 NaN 1 b 2.0 NaN 2 c 3.0 NaN 3 b NaN 4.0 4 c NaN 5.0 5 d NaN 6.0

Concat stacks the rows without matching on the key column. Missing columns are filled with NaN.

merge (join on key)¶

python joined = pd.merge(df_left, df_right, on="key") print(joined)

key value_left value_right 0 b 2 4 1 c 3 5

Merge matches rows where key values agree, discarding non-matching rows (inner join by default).

Decision Guide¶

Use the following rules to choose between the three operations.

Stacking rows with the same columns — use pd.concat with axis=0
Stacking columns with the same index — use pd.concat with axis=1
Joining on shared key columns — use pd.merge
DataFrame.append — do not use; replace with pd.concat

Performance Rule

When combining many DataFrames in a loop, collect them in a list first and call pd.concat once at the end. Repeated concatenation inside a loop copies all previous data on every iteration.

Summary¶

Operation	When to Use
`pd.concat([df1, df2])`	Stack DataFrames vertically or horizontally by position
`pd.merge(df1, df2, on=...)`	Join DataFrames on matching key column values
`df1.append(df2)`	Never — deprecated and removed; use `pd.concat` instead

Key Takeaways:

pd.concat stacks by position along an axis; pd.merge matches rows by key values
DataFrame.append was removed in pandas 2.0 — always use pd.concat instead
Collect DataFrames in a list and concat once to avoid quadratic copying overhead
Choose merge for database-style joins and concat for simple stacking operations

Exercises¶

Exercise 1. Explain the difference between pd.concat(), pd.merge(), and DataFrame.join(). When would you use each?

Solution to Exercise 1

```python import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]}) result = pd.concat([df1, df2], ignore_index=True) print(result) ```

Exercise 2. Write code that demonstrates the same result using both pd.concat([df1, df2], axis=1) and df1.join(df2).

Solution to Exercise 2

See the explanation in the main content. The key concept involves understanding how pd.concat() aligns data along the specified axis and handles mismatched indices or columns.

Exercise 3. Create two DataFrames and show that pd.merge() performs a join on a common column while pd.concat() simply stacks rows or columns.

Solution to Exercise 3

```python import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]}) result = pd.concat([df1, df2], axis=0) print(result) ```

Exercise 4. Write code that combines rows from two DataFrames. First use pd.concat(), then achieve the same result using pd.merge() with how='outer'.

Solution to Exercise 4

```python import pandas as pd

df1 = pd.DataFrame({'A': [1, 2]}, index=[0, 1]) df2 = pd.DataFrame({'A': [3, 4]}, index=[2, 3]) result = pd.concat([df1, df2]) print(result) ```