Data Science Tutorial

How to Create Columns in Pandas

Learn how to create column variables, categorical columns with pd.cut, and one-hot encoding columns with pd.get_dummies — with interactive Python examples.

By Suhith Illesinghe · 9 Apr 2026 · 10 min read
Ad Advertisement — 728 x 90

Creating new columns is one of the most important skills in data analysis. Whether you're computing a derived metric, binning continuous data into categories, or preparing features for a machine learning model — you'll be adding columns to a DataFrame constantly.

In this article you'll learn three things:

  1. How to create and assign values to a column
  2. How to create a categorical column with pd.cut()
  3. How to create one-hot encodings with pd.get_dummies()

1. The Dataset

We'll use a petrol station dataset with fuel transactions across Australia. Each row has a station, fuel type, litres sold, price per litre, and the state. Some values are intentionally missing.

Python — editable
Figure 1: Fuel transactions — 15 rows, 5 columns.

2. Create a Column with Bracket Assignment

The simplest way to create a new column is bracket assignment. Let's add a revenue column initialised to zero:

Python — editable
Figure 2: New 'revenue' column filled with zeros.

You can also use loc to achieve the same thing. This is useful when working with filtered DataFrames to avoid the SettingWithCopyWarning:

Python — editable
Figure 3: Both 'revenue' and 'discount' columns created.
Ad Advertisement — 728 x 90

3. Populate with Random Values

Zeros aren't very interesting. Let's populate the discount column with random values between 0 and 15 (cents off per litre) using np.random.randint():

Python — editable
Figure 4: Random discount values between 0 and 15.

4. Compute a Derived Column

Now let's compute the actual revenue column. Revenue = litres x price per litre. But there's a catch — some rows have NaN in litres or price_per_litre. If you multiply a number by NaN, the result is NaN. Use fillna(0) first:

Python — editable
Figure 5: Revenue computed from litres x price. NaN rows produce 0.00.

Let's also compute the discounted price by subtracting the discount (in cents) from the price per litre:

Python — editable
Figure 6: Discounted price derived from price minus discount.

Let's clean up the extra columns and keep the ones we need going forward:

Python — editable
Figure 7: Cleaned DataFrame with the revenue column retained.
Ad Advertisement — 728 x 90

5. Create Categorical Columns with pd.cut()

You have a continuous revenue column. What if the business wants to categorise transactions into groups — say Low, Medium, and High revenue? This is where pd.cut() comes in. It takes a continuous column and bins it into discrete categories.

First, let's calculate the bins dynamically from the data:

Python — editable
Figure 8: Dynamically calculated bin edges.

Now create labels for each bin range and apply pd.cut():

Python — editable
Figure 9: Revenue categorised into bins.

Now you can group by the category to see how many transactions fall into each bucket:

Python — editable
Figure 10: Transaction volume per revenue category.
Ad Advertisement — 728 x 90

6. One-Hot Encoding with pd.get_dummies()

Categorical columns are great for analysis, but machine learning models typically need numerical inputs. One-hot encoding converts each category into its own binary column (1 or 0). Use pd.get_dummies() to do this in one line:

Python — editable
Figure 11: One-hot encoded columns — one per category plus NaN.

The dummy_na=True parameter adds a column for NaN values — useful if missing data is meaningful. Now merge the dummies back into the original DataFrame:

Python — editable
Figure 12: Original data combined with one-hot encoded columns.

The DataFrame now has binary indicator columns ready for any machine learning pipeline or statistical model.

Summary

You've learned three essential column creation techniques:

  1. Direct assignmentdf['col'] = value or df.loc[:, 'col'] = value to create new columns with constants, random values, or computed expressions
  2. Categorical binningpd.cut() to bin continuous values into labelled categories with dynamic or manual bins
  3. One-hot encodingpd.get_dummies() to convert categorical columns into binary indicators for ML models

Try editing the code blocks above — change the number of bins, use fuel_type instead of revenue_category for one-hot encoding, or compute new derived columns like revenue per litre.

Ad Advertisement — 728 x 90

References

Suhith Illesinghe
Curiosity is the first step to make a difference. I hope to inspire others to explore, build and champion collaborative growth.
Follow on Medium ↗