Cluster Analysis


I will show how we can use retail transaction data to identify customer types

The dataset used for this analysis is a publicly shared dataset from Walmart
Data
Data
These are the first 5 rows of the dataset. The dataset itself contains 647,054 rows.

Data
This is a graph of items purchased by Department. Not surprising that grocery items are the most common purchase at Walmart.

Data
Again, we are not surprised to see that most customers go to Walmart on Fri/Sat/Sun.

Data
Knowing all of this, let's try and apply cluster analysis to see if we can create different customer profiles.

Data
Applying K-Means, we can see that there should be 3 clusters.

Data
Data
Data
When we look at the different clusters, we can see some differences. Cluster 0 has a similar breakdown to that of the overall breakdown of customers by day. However, we can see that for Cluster 1, these customers are more indifferent to the day of the week. For Cluster 2, we can see that these customers have a much strong preference for Sunday and less so on the weekend. Data
Data
Data
The real differences start to show when we look at the number of items purchased by customers in the different clusters. Cluster 1 has a very low item count, whereas Cluster 2 has a very high item count. Since Cluster 0 falls in between,
I would label Cluster 0 as average. Data
Data
Data
For the most part, Cluster 0 and Cluster 2 have very similar purchases by department. Cluster 1 has a strong preference for Impulse Merchandise and Mens Wear when compared to the other clusters.
Data
To summarize:
Cluster 1: Biggest group of customers. These customers come in and purchase anywhere between 1-10 items and have a stronger preference for
Impulse Merchandise and Men's wear compared to the other clusters.
Cluster 0: Second biggest group of customers. These are your average customers. They come in and purchase anywhere between
16-30 items with no big preference when it comes to items.
Cluster 2: Smallest group of customers. Similar to Cluster 0 except that they have a strong preference to shop on the weekends
and they purchase more items per visit.