Data Science

How to sort data with Pandas?

Learn how to sort data with pandas using sort_values(). Covers ascending/descending, NaN handling, mixed case, dates, multi-column sort, sort_index, nlargest/nsmallest, and custom Categorical ordering.
Suhith Illesinghe · Apr 7, 2026 · 12 min read
Ad Advertisement — 728 x 90

Sorting data is one of the most fundamental operations in data analysis. Whether you are ranking stations by revenue, ordering transactions by date, or finding the top-selling fuel type, pandas sort_values() is the workhorse. But sorting comes with gotchas — NaN placement, mixed-case strings, dates stored as text, and multi-column sort behaviour that can trip you up.

In this article you will learn:

  1. How to do a simple generic sort
  2. How sorting works with missing information
  3. How to sort string columns with mixed uppercase and lowercase
  4. How to sort date columns correctly
  5. How to sort using multiple columns
  6. How to use sort_index() and nlargest() / nsmallest()
  7. How to define a custom sort order with pd.Categorical

The dataset

We will use an Australian petrol station dataset. Each row represents a fuel transaction with the station name, state, fuel_type, litres sold, price_per_litre, and a transaction_date. Some values are intentionally missing and some strings have mixed case to demonstrate sorting edge cases.

Python — editable
Figure 1: Fuel transactions — 15 rows, 6 columns. Note the mixed case in fuel_type and string dates.

1. How to do a simple generic sort?

Pandas provides the sort_values() method. Let's sort the DataFrame by station name in ascending order (A to Z), which is the default.

Python — editable
Figure 2: DataFrame sorted by station (ascending). Note the original index is preserved.

The data is sorted alphabetically by station. Notice that the original row index has been carried along. To sort in descending order (Z to A), pass ascending=False.

Python — editable

If the old index is distracting, use ignore_index=True to reset it to 0, 1, 2...

Python — editable
Figure 3: Descending sort with a clean reset index.
Ad Advertisement — 300 x 250

2. How does sorting work with missing information?

Our litres column has some NaN values. Let's sort by it and see where the missing values end up.

Python — editable
Figure 4: NaN values are placed at the end by default.

By default, NaN values are pushed to the bottom regardless of sort direction. To move them to the top, use na_position='first'.

Python — editable
Figure 5: NaN values moved to the beginning with na_position='first'.

This is useful when you want to quickly identify rows with missing data at the top of a report.

3. How to sort mixed uppercase and lowercase strings?

Look closely at the fuel_type column — some entries start with uppercase (Diesel, Premium) and others with lowercase (diesel, premium). Let's sort and see what happens.

Python — editable
Figure 6: Uppercase entries sort before lowercase — 'D' comes before 'd'.

All uppercase entries appear first because sort_values uses ASCII ordering where capital letters (A-Z) have lower values than lowercase (a-z). In real-world data this is almost never what you want. The fix is to use the key parameter to normalise case during sorting.

Ad Advertisement — 300 x 250
Python — editable
Figure 7: True alphabetical sort using key=str.lower().

Now all diesel entries are grouped together regardless of case, followed by premium and unleaded. The key parameter applies a function to the column before sorting — the original values in the DataFrame are unchanged.

4. How to sort date columns correctly?

Our transaction_date column is stored as strings. Let's try sorting it directly.

Python — editable
Figure 8: String dates sort lexicographically — '2026-1-22' and '2026-2-3' are out of order.

Notice something odd? 2026-1-22 sorts before 2026-01-15 because as a string, the character '1' comes before '0' in '01'. String sorting compares character by character, not date values. The fix is to convert to datetime first.

Python — editable
Figure 9: Dates sorted chronologically after converting to datetime.

Now the dates are in correct chronological order. This is one of the most common mistakes in data analysis — always check the dtype of date columns before sorting.

Ad Advertisement — 300 x 250

5. How to sort using multiple columns?

You can sort by multiple columns by passing a list. Let's sort by station (ascending) and litres (descending) to see the highest volume transactions for each station first.

Python — editable
Figure 10: Sorted by station (A-Z), then by litres (highest first) within each station.

Each station's transactions now appear with the largest volume at the top. The ascending parameter takes a list matching the column list — True for station (A-Z), False for litres (highest first).

A useful pattern is to combine multi-column sorting with groupby().first() to extract the top row per group.

Python — editable
Figure 11: Highest-volume transaction per station using sort + groupby().first().

6. sort_index() and nlargest() / nsmallest()

While sort_values() sorts by column values, sort_index() sorts by the row index. This is useful after a groupby or when you've shuffled the DataFrame and want to restore the original order.

Python — editable
Figure 12: sort_index() restores the original row order.

For the common pattern of "sort and take the top N", pandas provides nlargest() and nsmallest() — which are faster than sort_values().head() because they use a partial sort internally.

Python — editable
Figure 13: nlargest and nsmallest — fast shortcuts for top/bottom N rows.
Ad Advertisement — 300 x 250

7. Custom sort order with pd.Categorical

Sometimes alphabetical order is not what you want. For example, the business might want fuel types ordered by price tier: Premium first, then Unleaded, then Diesel. You can define this custom order using pd.Categorical.

Python — editable
Figure 14: Custom business order — Premium first, then Unleaded, then Diesel.

The ordered=True flag tells pandas that this is not just a set of categories but a sequence with a meaningful order. Now sort_values respects your custom ranking instead of the alphabet. This is powerful for reporting where the sort order has business meaning — priority levels, severity rankings, product tiers, etc.

Python — editable
Figure 15: Station (A-Z) then fuel type in custom business order.

Summary

You now have seven sorting techniques in your toolkit:

  1. Basic sortsort_values('column') with ascending and ignore_index
  2. NaN handlingna_position='first' to surface missing data
  3. Mixed casekey=lambda col: col.str.lower() for true alphabetical sorting
  4. Date columns — convert to datetime with pd.to_datetime() before sorting
  5. Multi-column — pass lists to sort by multiple columns with independent directions
  6. sort_index + nlargest/nsmallest — restore index order and fast top-N shortcuts
  7. Custom orderpd.Categorical with ordered=True for business-defined sort sequences

Try editing the code blocks above to experiment — sort by price_per_litre, find the 3 cheapest transactions per station, or define your own custom fuel type order.

Ad Advertisement — 728 x 90

References

Suhith Illesinghe
Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.
Follow on Medium ↗