Merging is one of the most common operations in data analysis. Whether you're combining customer records with orders, joining station details with fuel prices, or linking any two related datasets — pd.merge() is the tool you'll reach for.
In this article we'll walk through every type of merge in pandas using two small DataFrames. Each code block is interactive — edit the code and click Run (or press Ctrl+Enter) to see the result.
1. Create the DataFrames
We'll work with two related DataFrames about Australian petrol stations. The first, stations_df, holds station details. The second, prices_df, holds recent fuel prices. They share a common key — station_name — but the lists don't perfectly overlap, which is what makes merge interesting.
Notice that Caltex Bondi, BP Southbank, and Shell Fortitude Valley appear in both DataFrames. Ampol Parramatta and 7-Eleven St Kilda exist only in stations_df, while United Coogee and Metro Petroleum CBD exist only in prices_df. This mismatch is exactly what different join types are designed to handle.
2. Simple Merge
The simplest form of pd.merge() takes two DataFrames and automatically detects common column names. By default it performs an inner join — keeping only rows that match in both DataFrames.
Pandas found station_name in both DataFrames and used it as the join key. The result has 3 rows — the stations that appear in both tables. The 4 stations unique to one table were dropped.
3. Inner Join
An inner join is the default. You can make it explicit with how='inner'. If the key columns have different names in each DataFrame, use left_on and right_on.
Let's also demonstrate left_on / right_on. We'll rename the key in prices_df to simulate columns with different names.
When you use left_on/right_on, both key columns appear in the result. You can drop the duplicate with .drop(columns='station').
4. Left Join
A left join keeps all rows from the left DataFrame and fills NaN where there's no match in the right. This is useful when you want to enrich a master table without losing any records.
All 5 rows from stations_df are present. The two stations with no matching price data (Ampol Parramatta and 7-Eleven St Kilda) have NaN in the fuel_type and price_per_litre columns.
5. Right Join
A right join is the mirror of a left join — it keeps all rows from the right DataFrame and fills NaN where there's no match in the left.
All 5 rows from prices_df are retained. United Coogee and Metro Petroleum CBD have no station details, so state and owner are NaN.
6. Outer Join
An outer join (also called a full outer join) keeps all rows from both DataFrames. Where there's no match, the missing side gets NaN.
The result has 7 rows: 3 matched + 2 left-only + 2 right-only. This gives you the complete picture of both datasets combined.
7. The indicator Parameter
Adding indicator=True to any merge appends a _merge column that tells you where each row came from: left_only, right_only, or both. This is incredibly useful for diagnosing data quality and building anti-joins.
The _merge column makes it easy to filter for specific subsets — which is exactly what the next three sections do.
8. Left-Only Anti-Join
A left-only anti-join finds rows in the left DataFrame that have no match in the right. In SQL terms, this is a LEFT JOIN ... WHERE right.key IS NULL. In pandas, combine a left join with indicator=True and filter for left_only.
Ampol Parramatta and 7-Eleven St Kilda appear in our station list but have no price data — they're "orphans" in the left table.
9. Right-Only Anti-Join
The mirror image — find rows in the right DataFrame that have no match in the left.
United Coogee and Metro Petroleum CBD have price data but no station details — they're "orphans" in the right table.
10. Exclusive Outer Join
The exclusive outer join (sometimes called a full anti-join) combines both anti-joins — it returns all rows that exist in only one DataFrame.
This is a powerful data quality check. In one query you can see every record that doesn't have a counterpart in the other table.
Join Type Visual Summary
Here's a quick reference for the join types covered in this article:
- Inner join (
how='inner') — only matching rows from both sides - Left join (
how='left') — all rows from left + matching from right - Right join (
how='right') — all rows from right + matching from left - Outer join (
how='outer') — all rows from both sides - Left anti-join — left join +
indicator=True+ filterleft_only - Right anti-join — right join +
indicator=True+ filterright_only - Exclusive outer — outer join +
indicator=True+ filter!= both
Summary
You now know how to use every type of merge in pandas:
- Simple merge —
pd.merge(left, right)auto-detects keys and performs an inner join - Inner join —
how='inner'keeps only matching rows - Left join —
how='left'keeps all left rows, NaN for missing right data - Right join —
how='right'keeps all right rows, NaN for missing left data - Outer join —
how='outer'keeps everything from both sides - Anti-joins — combine any join with
indicator=Trueto find non-matching rows
Try editing the code blocks above — add new stations, change prices, or combine all three concepts to build more complex queries.
References
- Original article: How to merge data with Pandas? — Medium
- pandas documentation: pandas.merge
- pandas documentation: Merge, join, concatenate