Blog Post 10 Min Read

Customer Segmentation

Hai Bao

Data Science Enthusiast • May 10, 2026

Customer segmentation is the strategic process of dividing a customer base into groups sharing similar characteristics—such as demographics, behaviors, or needs—to enable targeted marketing, improved retention, and higher ROI. We can achieve this through cluster analysis, where we group similar data points together.

Using a dataset provided by Walmart, that contains anonymous purchase data from an anonymous store, we will attempt to create different clusters, which will represent different customer profiles, based on the purchase data.

High-resolution performance benchmark graph

Fig 1.1: Sample of dataset provided by Walmart.

Interpreting the raw data

We have the following data field information provided by Walmart:

  • ○ VisitNumber - an id corresponding to a single trip by a single customer
  • ○ Weekday - the weekday of the trip
  • ○ Upc - the UPC number of the product purchased
  • ○ ScanCount - the number of the given item that was purchased. A negative value indicates a product return.
  • ○ DepartmentDescription - a high-level description of the item's department
  • ○ FinelineNumber - a more refined category for each of the products, created by Walmart


We will ignore the FineLineNumber and Upc fields, as they are not relevant in our analysis.

High-resolution performance benchmark graph

Fig 1.2: The elbow method suggests that the optimal number of clusters is 3.

Analyzing the clustered data

After applying the K-Means Clustering algoirthm and analyzing the clusters with respect to the fields: Weekday, ScanCount, and DepartmentDescription, we arrive at the following:

Cluster ID By Weekday By ScanCount Top 5 Department
0 Slight preference for weekend 10-32 items per visit
  • Grocery Dry Goods
  • DSD Grocery
  • Produce
  • Dairy
  • Personal Care
1 Indifferent 1-10 items per visit
  • DSD Grocery
  • Impulse Merchandise
  • Personal Care
  • Grocery Dry Goods
  • Produce
2 Strong preference for weekend 32-100+ items per visit
  • Grocery Dry Goods
  • DSD Grocery
  • Dairy
  • Produce
  • Frozen Foods

Table 1.1: Cluster analysis amongst the different fields

Building the profile

Based on the cluster analysis, here are the final customer profiles

Cluster ID Customer Profile
0 Smaller household without children, due to the lower item count per visit.
1 Individuals who only need a couple of items, who are more inclined towards impulse merchandise.
2 Larger household, due to the higher item count per visit.

Bonus Insights Content

High-resolution performance benchmark graph

Fig 1.3: The customer profile associated with Cluster 1 has a stronger preference for Impulse Merchandise, Men's Wear and Financial Services, compared to the other 2 customer profiles.