How do you sort a DataFrame in pandas?

Use df.sort_values('column_name') to sort by a column in ascending order. Pass ascending=False for descending. Use inplace=True to modify the DataFrame directly, or assign the result to a new variable.

Where do NaN values go when sorting in pandas?

By default, NaN values are placed at the end of the sorted result regardless of ascending or descending order. Use na_position='first' in sort_values() to move them to the beginning.

What is the difference between sort_values and sort_index in pandas?

sort_values() sorts by column values while sort_index() sorts by the row index labels. Use sort_values when you want to order data by a specific column, and sort_index when you want to restore or reorder the index.

How do you get the top N rows by value in pandas?

Use df.nlargest(n, 'column') for the top N rows or df.nsmallest(n, 'column') for the bottom N. These are faster than sort_values().head(n) because they use a partial sort algorithm internally.

How do you sort by a custom order in pandas?

Convert the column to pd.Categorical with a custom categories list and ordered=True. For example: df['col'] = pd.Categorical(df['col'], categories=['Premium','Unleaded','Diesel'], ordered=True). Then sort_values will follow your custom order.

How to Sort Data with Pandas — sort_values, sort

Sorting data is one of the most fundamental operations in data analysis. Whether you are ranking stations by revenue, ordering transactions by date, or finding the top-selling fuel type, pandas sort_values() is the workhorse. But sorting comes with gotchas — NaN placement, mixed-case strings, dates stored as text, and multi-column sort behaviour that can trip you up.

In this article you will learn:

How to do a simple generic sort
How sorting works with missing information
How to sort string columns with mixed uppercase and lowercase
How to sort date columns correctly
How to sort using multiple columns
How to use sort_index() and nlargest() / nsmallest()
How to define a custom sort order with pd.Categorical

The dataset

We will use an Australian petrol station dataset. Each row represents a fuel transaction with the station name, state, fuel_type, litres sold, price_per_litre, and a transaction_date. Some values are intentionally missing and some strings have mixed case to demonstrate sorting edge cases.

Python — editable

import pandas as pd
import numpy as np

data = {
    "station": [
        'Caltex Bondi','Caltex Bondi','Caltex Bondi',
        'BP Southbank','BP Southbank','BP Southbank','BP Southbank',
        'Shell Fortitude Valley','Shell Fortitude Valley','Shell Fortitude Valley',
        'Caltex Bondi','Caltex Bondi',
        'BP Southbank','BP Southbank','BP Southbank'
    ],
    "state": [
        'NSW','NSW','NSW',
        'VIC','VIC','VIC','VIC',
        'QLD','QLD',np.nan,
        'NSW','NSW',
        'VIC','VIC','VIC'
    ],
    "fuel_type": [
        'Unleaded','diesel','Premium',
        'Unleaded','Unleaded','diesel','premium',
        'Diesel','unleaded',np.nan,
        'Diesel','Unleaded',
        'diesel','Premium','unleaded'
    ],
    "litres": [
        45.2, 60.0, 38.5,
        52.1, 47.8, np.nan, 41.0,
        55.3, 44.9, np.nan,
        58.7, 40.1,
        63.2, 35.6, 49.0
    ],
    "price_per_litre": [
        1.89, 1.95, 2.12,
        1.85, 1.85, 1.92, 2.09,
        1.93, np.nan, 2.15,
        1.95, 1.89,
        1.92, 2.09, 1.85
    ],
    "transaction_date": [
        '2026-01-15','2026-01-15','2026-01-20',
        '2026-02-03','2026-02-03','2026-2-3','2026-02-10',
        '2026-03-01','2026-03-01','2026-03-01',
        '2026-01-22','2026-1-22',
        '2026-02-14','2026-02-14','2026-02-14'
    ]
}

fuel_df = pd.DataFrame(data)
fuel_df

Figure 1: Fuel transactions — 15 rows, 6 columns. Note the mixed case in fuel_type and string dates.

1. How to do a simple generic sort?

Pandas provides the sort_values() method. Let's sort the DataFrame by station name in ascending order (A to Z), which is the default.

Python — editable

fuel_df.sort_values('station')

Figure 2: DataFrame sorted by station (ascending). Note the original index is preserved.

The data is sorted alphabetically by station. Notice that the original row index has been carried along. To sort in descending order (Z to A), pass ascending=False.

Python — editable

fuel_df.sort_values('station', ascending=False)

If the old index is distracting, use ignore_index=True to reset it to 0, 1, 2...

Python — editable

fuel_df.sort_values('station', ascending=False, ignore_index=True)

Figure 3: Descending sort with a clean reset index.

2. How does sorting work with missing information?

Our litres column has some NaN values. Let's sort by it and see where the missing values end up.

Python — editable

fuel_df.sort_values('litres')

Figure 4: NaN values are placed at the end by default.

By default, NaN values are pushed to the bottom regardless of sort direction. To move them to the top, use na_position='first'.

Python — editable

fuel_df.sort_values('litres', na_position='first')

Figure 5: NaN values moved to the beginning with na_position='first'.

This is useful when you want to quickly identify rows with missing data at the top of a report.

3. How to sort mixed uppercase and lowercase strings?

Look closely at the fuel_type column — some entries start with uppercase (Diesel, Premium) and others with lowercase (diesel, premium). Let's sort and see what happens.

Python — editable

fuel_df.sort_values('fuel_type', ignore_index=True)

Figure 6: Uppercase entries sort before lowercase — 'D' comes before 'd'.

All uppercase entries appear first because sort_values uses ASCII ordering where capital letters (A-Z) have lower values than lowercase (a-z). In real-world data this is almost never what you want. The fix is to use the key parameter to normalise case during sorting.

Python — editable

fuel_df.sort_values('fuel_type', key=lambda col: col.str.lower(), ignore_index=True)

Figure 7: True alphabetical sort using key=str.lower().

Now all diesel entries are grouped together regardless of case, followed by premium and unleaded. The key parameter applies a function to the column before sorting — the original values in the DataFrame are unchanged.

4. How to sort date columns correctly?

Our transaction_date column is stored as strings. Let's try sorting it directly.

Python — editable

fuel_df.sort_values('transaction_date', ignore_index=True)

Figure 8: String dates sort lexicographically — '2026-1-22' and '2026-2-3' are out of order.

Notice something odd? 2026-1-22 sorts before 2026-01-15 because as a string, the character '1' comes before '0' in '01'. String sorting compares character by character, not date values. The fix is to convert to datetime first.

Python — editable

fuel_df['transaction_date'] = pd.to_datetime(fuel_df['transaction_date'], format='mixed')

fuel_df.sort_values('transaction_date', ignore_index=True)

Figure 9: Dates sorted chronologically after converting to datetime.

Now the dates are in correct chronological order. This is one of the most common mistakes in data analysis — always check the dtype of date columns before sorting.

5. How to sort using multiple columns?

You can sort by multiple columns by passing a list. Let's sort by station (ascending) and litres (descending) to see the highest volume transactions for each station first.

Python — editable

fuel_df.sort_values(['station', 'litres'],
                      ascending=[True, False], ignore_index=True)

Figure 10: Sorted by station (A-Z), then by litres (highest first) within each station.

Each station's transactions now appear with the largest volume at the top. The ascending parameter takes a list matching the column list — True for station (A-Z), False for litres (highest first).

A useful pattern is to combine multi-column sorting with groupby().first() to extract the top row per group.

Python — editable

sorted_df = fuel_df.sort_values(['station', 'litres'], ascending=[True, False])

# Extract the highest-volume transaction per station
sorted_df.groupby('station').first().reset_index()

Figure 11: Highest-volume transaction per station using sort + groupby().first().

6. `sort_index()` and `nlargest()` / `nsmallest()`

While sort_values() sorts by column values, sort_index() sorts by the row index. This is useful after a groupby or when you've shuffled the DataFrame and want to restore the original order.

Python — editable

# After sorting by litres, the index is shuffled
shuffled = fuel_df.sort_values('litres')

# Restore original row order with sort_index
shuffled.sort_index()

Figure 12: sort_index() restores the original row order.

For the common pattern of "sort and take the top N", pandas provides nlargest() and nsmallest() — which are faster than sort_values().head() because they use a partial sort internally.

Python — editable

# Top 5 transactions by litres
print("=== Top 5 by litres ===")
print(fuel_df.nlargest(5, 'litres').to_string())
print("\n=== Bottom 3 by price_per_litre ===")
print(fuel_df.nsmallest(3, 'price_per_litre').to_string())

Figure 13: nlargest and nsmallest — fast shortcuts for top/bottom N rows.

7. Custom sort order with `pd.Categorical`

Sometimes alphabetical order is not what you want. For example, the business might want fuel types ordered by price tier: Premium first, then Unleaded, then Diesel. You can define this custom order using pd.Categorical.

Python — editable

# First normalise the case
fuel_df['fuel_type_clean'] = fuel_df['fuel_type'].str.capitalize()

# Define custom order
custom_order = ['Premium', 'Unleaded', 'Diesel']
fuel_df['fuel_type_clean'] = pd.Categorical(
    fuel_df['fuel_type_clean'],
    categories=custom_order,
    ordered=True
)

fuel_df.sort_values('fuel_type_clean', ignore_index=True)

Figure 14: Custom business order — Premium first, then Unleaded, then Diesel.

The ordered=True flag tells pandas that this is not just a set of categories but a sequence with a meaningful order. Now sort_values respects your custom ranking instead of the alphabet. This is powerful for reporting where the sort order has business meaning — priority levels, severity rankings, product tiers, etc.

Python — editable

# Combine custom fuel order with station sort
fuel_df.sort_values(['station', 'fuel_type_clean'],
                    ascending=[True, True], ignore_index=True)[['station','fuel_type_clean','litres','price_per_litre']]

Figure 15: Station (A-Z) then fuel type in custom business order.

Summary

You now have seven sorting techniques in your toolkit:

Basic sort — sort_values('column') with ascending and ignore_index
NaN handling — na_position='first' to surface missing data
Mixed case — key=lambda col: col.str.lower() for true alphabetical sorting
Date columns — convert to datetime with pd.to_datetime() before sorting
Multi-column — pass lists to sort by multiple columns with independent directions
sort_index + nlargest/nsmallest — restore index order and fast top-N shortcuts
Custom order — pd.Categorical with ordered=True for business-defined sort sequences

Try editing the code blocks above to experiment — sort by price_per_litre, find the 3 cheapest transactions per station, or define your own custom fuel type order.

Data Science Data Science Training Data Engineering Pandas Python

References

Original article: How to sort data with Pandas? — Medium
pandas documentation: pandas.DataFrame.sort_values
pandas documentation: pandas.DataFrame.sort_index
pandas documentation: pandas.DataFrame.nlargest
pandas documentation: pandas.CategoricalDtype

Suhith Illesinghe

Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.

Follow on Medium ↗

How to sort data with Pandas?

The dataset

1. How to do a simple generic sort?

2. How does sorting work with missing information?

3. How to sort mixed uppercase and lowercase strings?

4. How to sort date columns correctly?

5. How to sort using multiple columns?

6. sort_index() and nlargest() / nsmallest()

7. Custom sort order with pd.Categorical

Summary

References

Related Articles

6. `sort_index()` and `nlargest()` / `nsmallest()`

7. Custom sort order with `pd.Categorical`