Data Science Tutorial

How to Merge Data with Pandas

Learn how to combine DataFrames using pd.merge — inner, left, right, outer joins and anti-joins with interactive Python examples you can edit and run in your browser.

By Suhith Illesinghe · 8 Apr 2026 · 10 min read
Ad Advertisement — 728 x 90

Merging is one of the most common operations in data analysis. Whether you're combining customer records with orders, joining station details with fuel prices, or linking any two related datasets — pd.merge() is the tool you'll reach for.

In this article we'll walk through every type of merge in pandas using two small DataFrames. Each code block is interactive — edit the code and click Run (or press Ctrl+Enter) to see the result.

1. Create the DataFrames

We'll work with two related DataFrames about Australian petrol stations. The first, stations_df, holds station details. The second, prices_df, holds recent fuel prices. They share a common key — station_name — but the lists don't perfectly overlap, which is what makes merge interesting.

Python — editable
Figure 1: Two DataFrames with partially overlapping station names.

Notice that Caltex Bondi, BP Southbank, and Shell Fortitude Valley appear in both DataFrames. Ampol Parramatta and 7-Eleven St Kilda exist only in stations_df, while United Coogee and Metro Petroleum CBD exist only in prices_df. This mismatch is exactly what different join types are designed to handle.

2. Simple Merge

The simplest form of pd.merge() takes two DataFrames and automatically detects common column names. By default it performs an inner join — keeping only rows that match in both DataFrames.

Python — editable
Figure 2: Simple merge returns only the 3 stations that appear in both DataFrames.

Pandas found station_name in both DataFrames and used it as the join key. The result has 3 rows — the stations that appear in both tables. The 4 stations unique to one table were dropped.

Ad Advertisement — 728 x 90

3. Inner Join

An inner join is the default. You can make it explicit with how='inner'. If the key columns have different names in each DataFrame, use left_on and right_on.

Python — editable
Figure 3: Explicit inner join — identical result to the simple merge.

Let's also demonstrate left_on / right_on. We'll rename the key in prices_df to simulate columns with different names.

Python — editable
Figure 4: Inner join using left_on and right_on for differently named columns.

When you use left_on/right_on, both key columns appear in the result. You can drop the duplicate with .drop(columns='station').

4. Left Join

A left join keeps all rows from the left DataFrame and fills NaN where there's no match in the right. This is useful when you want to enrich a master table without losing any records.

Python — editable
Figure 5: Left join keeps all 5 stations. Ampol Parramatta and 7-Eleven St Kilda have NaN prices.

All 5 rows from stations_df are present. The two stations with no matching price data (Ampol Parramatta and 7-Eleven St Kilda) have NaN in the fuel_type and price_per_litre columns.

Ad Advertisement — 728 x 90

5. Right Join

A right join is the mirror of a left join — it keeps all rows from the right DataFrame and fills NaN where there's no match in the left.

Python — editable
Figure 6: Right join keeps all 5 price records. United Coogee and Metro Petroleum CBD have NaN for state and owner.

All 5 rows from prices_df are retained. United Coogee and Metro Petroleum CBD have no station details, so state and owner are NaN.

6. Outer Join

An outer join (also called a full outer join) keeps all rows from both DataFrames. Where there's no match, the missing side gets NaN.

Python — editable
Figure 7: Outer join returns 7 rows — all stations from both DataFrames with NaN where data is missing.

The result has 7 rows: 3 matched + 2 left-only + 2 right-only. This gives you the complete picture of both datasets combined.

7. The indicator Parameter

Adding indicator=True to any merge appends a _merge column that tells you where each row came from: left_only, right_only, or both. This is incredibly useful for diagnosing data quality and building anti-joins.

Python — editable
Figure 8: The _merge column shows the source of each row.

The _merge column makes it easy to filter for specific subsets — which is exactly what the next three sections do.

Ad Advertisement — 728 x 90

8. Left-Only Anti-Join

A left-only anti-join finds rows in the left DataFrame that have no match in the right. In SQL terms, this is a LEFT JOIN ... WHERE right.key IS NULL. In pandas, combine a left join with indicator=True and filter for left_only.

Python — editable
Figure 9: Stations that have no matching price record.

Ampol Parramatta and 7-Eleven St Kilda appear in our station list but have no price data — they're "orphans" in the left table.

9. Right-Only Anti-Join

The mirror image — find rows in the right DataFrame that have no match in the left.

Python — editable
Figure 10: Price records for stations not in the station details table.

United Coogee and Metro Petroleum CBD have price data but no station details — they're "orphans" in the right table.

10. Exclusive Outer Join

The exclusive outer join (sometimes called a full anti-join) combines both anti-joins — it returns all rows that exist in only one DataFrame.

Python — editable
Figure 11: All rows that exist in only one DataFrame — the complete set of mismatches.

This is a powerful data quality check. In one query you can see every record that doesn't have a counterpart in the other table.

Ad Advertisement — 728 x 90

Join Type Visual Summary

Here's a quick reference for the join types covered in this article:

Python — editable
Figure 12: Row counts for each join type — a quick sanity check.

Summary

You now know how to use every type of merge in pandas:

  1. Simple mergepd.merge(left, right) auto-detects keys and performs an inner join
  2. Inner joinhow='inner' keeps only matching rows
  3. Left joinhow='left' keeps all left rows, NaN for missing right data
  4. Right joinhow='right' keeps all right rows, NaN for missing left data
  5. Outer joinhow='outer' keeps everything from both sides
  6. Anti-joins — combine any join with indicator=True to find non-matching rows

Try editing the code blocks above — add new stations, change prices, or combine all three concepts to build more complex queries.

Ad Advertisement — 728 x 90

References

Suhith Illesinghe
Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.
Follow on Medium ↗