Sorting data is one of the most fundamental operations in data analysis. Whether you are ranking stations by revenue, ordering transactions by date, or finding the top-selling fuel type, pandas sort_values() is the workhorse. But sorting comes with gotchas — NaN placement, mixed-case strings, dates stored as text, and multi-column sort behaviour that can trip you up.
In this article you will learn:
- How to do a simple generic sort
- How sorting works with missing information
- How to sort string columns with mixed uppercase and lowercase
- How to sort date columns correctly
- How to sort using multiple columns
- How to use
sort_index()andnlargest()/nsmallest() - How to define a custom sort order with
pd.Categorical
The dataset
We will use an Australian petrol station dataset. Each row represents a fuel transaction with the station name, state, fuel_type, litres sold, price_per_litre, and a transaction_date. Some values are intentionally missing and some strings have mixed case to demonstrate sorting edge cases.
1. How to do a simple generic sort?
Pandas provides the sort_values() method. Let's sort the DataFrame by station name in ascending order (A to Z), which is the default.
The data is sorted alphabetically by station. Notice that the original row index has been carried along. To sort in descending order (Z to A), pass ascending=False.
If the old index is distracting, use ignore_index=True to reset it to 0, 1, 2...
2. How does sorting work with missing information?
Our litres column has some NaN values. Let's sort by it and see where the missing values end up.
By default, NaN values are pushed to the bottom regardless of sort direction. To move them to the top, use na_position='first'.
This is useful when you want to quickly identify rows with missing data at the top of a report.
3. How to sort mixed uppercase and lowercase strings?
Look closely at the fuel_type column — some entries start with uppercase (Diesel, Premium) and others with lowercase (diesel, premium). Let's sort and see what happens.
All uppercase entries appear first because sort_values uses ASCII ordering where capital letters (A-Z) have lower values than lowercase (a-z). In real-world data this is almost never what you want. The fix is to use the key parameter to normalise case during sorting.
Now all diesel entries are grouped together regardless of case, followed by premium and unleaded. The key parameter applies a function to the column before sorting — the original values in the DataFrame are unchanged.
4. How to sort date columns correctly?
Our transaction_date column is stored as strings. Let's try sorting it directly.
Notice something odd? 2026-1-22 sorts before 2026-01-15 because as a string, the character '1' comes before '0' in '01'. String sorting compares character by character, not date values. The fix is to convert to datetime first.
Now the dates are in correct chronological order. This is one of the most common mistakes in data analysis — always check the dtype of date columns before sorting.
5. How to sort using multiple columns?
You can sort by multiple columns by passing a list. Let's sort by station (ascending) and litres (descending) to see the highest volume transactions for each station first.
Each station's transactions now appear with the largest volume at the top. The ascending parameter takes a list matching the column list — True for station (A-Z), False for litres (highest first).
A useful pattern is to combine multi-column sorting with groupby().first() to extract the top row per group.
6. sort_index() and nlargest() / nsmallest()
While sort_values() sorts by column values, sort_index() sorts by the row index. This is useful after a groupby or when you've shuffled the DataFrame and want to restore the original order.
For the common pattern of "sort and take the top N", pandas provides nlargest() and nsmallest() — which are faster than sort_values().head() because they use a partial sort internally.
7. Custom sort order with pd.Categorical
Sometimes alphabetical order is not what you want. For example, the business might want fuel types ordered by price tier: Premium first, then Unleaded, then Diesel. You can define this custom order using pd.Categorical.
The ordered=True flag tells pandas that this is not just a set of categories but a sequence with a meaningful order. Now sort_values respects your custom ranking instead of the alphabet. This is powerful for reporting where the sort order has business meaning — priority levels, severity rankings, product tiers, etc.
Summary
You now have seven sorting techniques in your toolkit:
- Basic sort —
sort_values('column')withascendingandignore_index - NaN handling —
na_position='first'to surface missing data - Mixed case —
key=lambda col: col.str.lower()for true alphabetical sorting - Date columns — convert to
datetimewithpd.to_datetime()before sorting - Multi-column — pass lists to sort by multiple columns with independent directions
- sort_index + nlargest/nsmallest — restore index order and fast top-N shortcuts
- Custom order —
pd.Categoricalwithordered=Truefor business-defined sort sequences
Try editing the code blocks above to experiment — sort by price_per_litre, find the 3 cheapest transactions per station, or define your own custom fuel type order.
References
- Original article: How to sort data with Pandas? — Medium
- pandas documentation: pandas.DataFrame.sort_values
- pandas documentation: pandas.DataFrame.sort_index
- pandas documentation: pandas.DataFrame.nlargest
- pandas documentation: pandas.CategoricalDtype