How do you merge two DataFrames in pandas?

Use pd.merge(left_df, right_df) to merge two DataFrames. By default, pandas detects common column names and performs an inner join, returning only rows that match in both DataFrames.

What is the difference between inner join and outer join in pandas?

An inner join (how='inner') returns only rows with matching keys in both DataFrames. An outer join (how='outer') returns all rows from both DataFrames, filling NaN where there is no match.

How do you merge on columns with different names in pandas?

Use the left_on and right_on parameters: pd.merge(left, right, left_on='col_in_left', right_on='col_in_right'). This tells pandas which column from each DataFrame to use as the join key.

What does the indicator parameter do in pd.merge?

Setting indicator=True adds a '_merge' column to the result showing the source of each row: 'left_only', 'right_only', or 'both'. This is useful for finding anti-joins — rows that exist in only one DataFrame.

What is an anti-join in pandas?

An anti-join returns rows from one DataFrame that have no matching key in the other. In pandas, perform a left or right join with indicator=True, then filter for _merge == 'left_only' or 'right_only'.

How to Merge Data with Pandas — merge, join, inner, left, right, outer

Merging is one of the most common operations in data analysis. Whether you're combining customer records with orders, joining station details with fuel prices, or linking any two related datasets — pd.merge() is the tool you'll reach for.

In this article we'll walk through every type of merge in pandas using two small DataFrames. Each code block is interactive — edit the code and click Run (or press Ctrl+Enter) to see the result.

1. Create the DataFrames

We'll work with two related DataFrames about Australian petrol stations. The first, stations_df, holds station details. The second, prices_df, holds recent fuel prices. They share a common key — station_name — but the lists don't perfectly overlap, which is what makes merge interesting.

Python — editable

import pandas as pd
import numpy as np

# Station details
stations_df = pd.DataFrame({
    'station_name': ['Caltex Bondi', 'BP Southbank', 'Shell Fortitude Valley', 'Ampol Parramatta', '7-Eleven St Kilda'],
    'state': ['NSW', 'VIC', 'QLD', 'NSW', 'VIC'],
    'owner': ['Chevron', 'BP plc', 'Shell plc', 'Ampol Ltd', 'Seven & i']
})

# Fuel prices (note: some stations differ)
prices_df = pd.DataFrame({
    'station_name': ['Caltex Bondi', 'BP Southbank', 'Shell Fortitude Valley', 'United Coogee', 'Metro Petroleum CBD'],
    'fuel_type': ['Unleaded 91', 'Premium 98', 'Diesel', 'Unleaded 91', 'E10'],
    'price_per_litre': [1.89, 2.15, 1.95, 1.82, 1.79]
})

print("stations_df:")
print(stations_df.to_string(index=False))
print("\nprices_df:")
print(prices_df.to_string(index=False))

Figure 1: Two DataFrames with partially overlapping station names.

Notice that Caltex Bondi, BP Southbank, and Shell Fortitude Valley appear in both DataFrames. Ampol Parramatta and 7-Eleven St Kilda exist only in stations_df, while United Coogee and Metro Petroleum CBD exist only in prices_df. This mismatch is exactly what different join types are designed to handle.

2. Simple Merge

The simplest form of pd.merge() takes two DataFrames and automatically detects common column names. By default it performs an inner join — keeping only rows that match in both DataFrames.

Python — editable

# Simple merge — auto-detects 'station_name' as the join key
pd.merge(stations_df, prices_df)

Figure 2: Simple merge returns only the 3 stations that appear in both DataFrames.

Pandas found station_name in both DataFrames and used it as the join key. The result has 3 rows — the stations that appear in both tables. The 4 stations unique to one table were dropped.

3. Inner Join

An inner join is the default. You can make it explicit with how='inner'. If the key columns have different names in each DataFrame, use left_on and right_on.

Python — editable

# Explicit inner join
pd.merge(stations_df, prices_df,
         how='inner', on='station_name')

Figure 3: Explicit inner join — identical result to the simple merge.

Let's also demonstrate left_on / right_on. We'll rename the key in prices_df to simulate columns with different names.

Python — editable

# Rename key in prices to simulate different column names
prices_renamed = prices_df.rename(columns={'station_name': 'station'})

# Merge with left_on / right_on
pd.merge(stations_df, prices_renamed,
         how='inner', left_on='station_name', right_on='station')

Figure 4: Inner join using left_on and right_on for differently named columns.

When you use left_on/right_on, both key columns appear in the result. You can drop the duplicate with .drop(columns='station').

4. Left Join

A left join keeps all rows from the left DataFrame and fills NaN where there's no match in the right. This is useful when you want to enrich a master table without losing any records.

Python — editable

# Left join — keep all stations, fill NaN for missing prices
pd.merge(stations_df, prices_df,
         how='left', on='station_name')

Figure 5: Left join keeps all 5 stations. Ampol Parramatta and 7-Eleven St Kilda have NaN prices.

All 5 rows from stations_df are present. The two stations with no matching price data (Ampol Parramatta and 7-Eleven St Kilda) have NaN in the fuel_type and price_per_litre columns.

5. Right Join

A right join is the mirror of a left join — it keeps all rows from the right DataFrame and fills NaN where there's no match in the left.

Python — editable

# Right join — keep all prices, fill NaN for missing station details
pd.merge(stations_df, prices_df,
         how='right', on='station_name')

Figure 6: Right join keeps all 5 price records. United Coogee and Metro Petroleum CBD have NaN for state and owner.

All 5 rows from prices_df are retained. United Coogee and Metro Petroleum CBD have no station details, so state and owner are NaN.

6. Outer Join

An outer join (also called a full outer join) keeps all rows from both DataFrames. Where there's no match, the missing side gets NaN.

Python — editable

# Outer join — keep everything from both DataFrames
pd.merge(stations_df, prices_df,
         how='outer', on='station_name')

Figure 7: Outer join returns 7 rows — all stations from both DataFrames with NaN where data is missing.

The result has 7 rows: 3 matched + 2 left-only + 2 right-only. This gives you the complete picture of both datasets combined.

7. The `indicator` Parameter

Adding indicator=True to any merge appends a _merge column that tells you where each row came from: left_only, right_only, or both. This is incredibly useful for diagnosing data quality and building anti-joins.

Python — editable

# Outer join with indicator
pd.merge(stations_df, prices_df,
         how='outer', on='station_name', indicator=True)

Figure 8: The _merge column shows the source of each row.

The _merge column makes it easy to filter for specific subsets — which is exactly what the next three sections do.

8. Left-Only Anti-Join

A left-only anti-join finds rows in the left DataFrame that have no match in the right. In SQL terms, this is a LEFT JOIN ... WHERE right.key IS NULL. In pandas, combine a left join with indicator=True and filter for left_only.

Python — editable

# Stations with no price data
left_anti = pd.merge(stations_df, prices_df,
                     how='left', on='station_name', indicator=True)

left_anti.query('_merge == "left_only"')

Figure 9: Stations that have no matching price record.

Ampol Parramatta and 7-Eleven St Kilda appear in our station list but have no price data — they're "orphans" in the left table.

9. Right-Only Anti-Join

The mirror image — find rows in the right DataFrame that have no match in the left.

Python — editable

# Price records with no station details
right_anti = pd.merge(stations_df, prices_df,
                      how='right', on='station_name', indicator=True)

right_anti.query('_merge == "right_only"')

Figure 10: Price records for stations not in the station details table.

United Coogee and Metro Petroleum CBD have price data but no station details — they're "orphans" in the right table.

10. Exclusive Outer Join

The exclusive outer join (sometimes called a full anti-join) combines both anti-joins — it returns all rows that exist in only one DataFrame.

Python — editable

# All non-matching rows from both DataFrames
exclusive = pd.merge(stations_df, prices_df,
                     how='outer', on='station_name', indicator=True)

exclusive.query('_merge != "both"')

Figure 11: All rows that exist in only one DataFrame — the complete set of mismatches.

This is a powerful data quality check. In one query you can see every record that doesn't have a counterpart in the other table.

Join Type Visual Summary

Here's a quick reference for the join types covered in this article:

Inner join (how='inner') — only matching rows from both sides
Left join (how='left') — all rows from left + matching from right
Right join (how='right') — all rows from right + matching from left
Outer join (how='outer') — all rows from both sides
Left anti-join — left join + indicator=True + filter left_only
Right anti-join — right join + indicator=True + filter right_only
Exclusive outer — outer join + indicator=True + filter != both

Python — editable

# Summary: row counts for each join type
join_types = {
    'inner':  len(pd.merge(stations_df, prices_df, how='inner', on='station_name')),
    'left':   len(pd.merge(stations_df, prices_df, how='left', on='station_name')),
    'right':  len(pd.merge(stations_df, prices_df, how='right', on='station_name')),
    'outer':  len(pd.merge(stations_df, prices_df, how='outer', on='station_name')),
}

pd.DataFrame.from_dict(join_types, orient='index', columns=['row_count'])

Figure 12: Row counts for each join type — a quick sanity check.

Summary

You now know how to use every type of merge in pandas:

Simple merge — pd.merge(left, right) auto-detects keys and performs an inner join
Inner join — how='inner' keeps only matching rows
Left join — how='left' keeps all left rows, NaN for missing right data
Right join — how='right' keeps all right rows, NaN for missing left data
Outer join — how='outer' keeps everything from both sides
Anti-joins — combine any join with indicator=True to find non-matching rows

Try editing the code blocks above — add new stations, change prices, or combine all three concepts to build more complex queries.

Data Science Data Science Training Data Engineering Pandas Python

References

Original article: How to merge data with Pandas? — Medium
pandas documentation: pandas.merge
pandas documentation: Merge, join, concatenate

Suhith Illesinghe

Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.

Follow on Medium ↗

How to Merge Data with Pandas

1. Create the DataFrames

2. Simple Merge

3. Inner Join

4. Left Join

5. Right Join

6. Outer Join

7. The indicator Parameter

8. Left-Only Anti-Join

9. Right-Only Anti-Join

10. Exclusive Outer Join

Join Type Visual Summary

Summary

References

Related Articles

7. The `indicator` Parameter