Grouping is one of the most common operations in data analysis. Whether you are counting transactions, averaging prices, or summarising volumes, pandas groupby is almost always the starting point. But the output of a groupby call often comes back as an unnamed Series with the grouping column buried in the index — not the tidy DataFrame you need for downstream merges, charts, or exports.
This article shows you how to group data in pandas and, crucially, how to name and reshape the result into a clean DataFrame. We cover three approaches, each demonstrated on an interactive dataset you can edit and run directly in your browser:
to_frame()— convert the Series to a DataFrame and name the column explicitlyas_index=False— prevent the group column from becoming the index in the first placereset_index(name=)— the most concise one-liner that does both in a single call
The dataset
We will use a small petrol station dataset. Each row represents a fuel transaction recorded at stations across Australia. The columns capture the station name, fuel type, litres sold, price per litre, and the state where the station is located. Some values are intentionally missing to mirror real-world data quality issues.
The dataset has 15 transactions spread across three Australian petrol stations: Caltex Bondi (NSW), BP Southbank (VIC) and Shell Fortitude Valley (QLD). A natural question is: how many transactions were recorded at each station? Let's find out using groupby.
Notice two things about the result. First, it is a Series, not a DataFrame — there is no column header for the counts. Second, the station column has become the index rather than a regular column. Both of these quirks make the output harder to work with if you need to merge, export, or chart the data. Let's fix that.
Method 1: to_frame()
The to_frame() method converts a Series into a DataFrame and accepts a string argument that becomes the column name.
We now have a proper DataFrame with a meaningful column name. However, station is still sitting in the index. If you want it back as a regular column, chain reset_index().
This is a clean, merge-ready result. The downside is the two-step chain — to_frame() then reset_index(). The next method achieves the same thing differently.
Method 2: as_index=False
Passing as_index=False directly inside groupby() tells pandas to keep the grouping column as a regular column instead of promoting it to the index.
This already returns a DataFrame with a normal integer index — no reset_index() needed. The trade-off is that the aggregated column is automatically named size, which may not be the label you want. A quick rename() fixes that.
Same result, different route. Whether you prefer this over Method 1 is largely a matter of taste.
Method 3: reset_index(name=)
The name parameter on reset_index() lets you rename the Series values column and move the index back to a regular column — all in one call.
Method 3 is the most concise. A single chained call after size() handles both the naming and the index reset. When readability and brevity both matter, this is usually the best choice.
Which method should you use?
All three methods produce an identical DataFrame. The decision comes down to context:
- Method 1 (
to_frame+reset_index) — clearest when you want to separate the "convert to DataFrame" step from the "fix the index" step, which can be useful in longer chains. - Method 2 (
as_index=False) — best when you know upfront that you never want the group column in the index. Pair it withrename()if the default name isn't suitable. - Method 3 (
reset_index(name=)) — fewest characters, least cognitive overhead. Ideal for quick exploratory work and scripts.
Pick the one that fits your workflow, and try editing the code blocks above to experiment — change the grouping column to fuel type or state, swap size() for mean() on the litres column, or add your own stations to see how each method behaves.
References
- Original article: How to name grouped data in Pandas? — Medium
- pandas documentation: pandas.DataFrame.groupby
- pandas documentation: pandas.Series.to_frame
- pandas documentation: pandas.DataFrame.reset_index