Market basket analysis: Example using grocery data
Idea
Our market basket analysis is based on the purchase data collected from one month of operation at a real-world grocery store. The data contains 9,835 transactions or about 327 transactions per day (roughly 30 transactions per hour in a 12-hour business day), suggesting that the retailer is not particularly large, nor is it particularly small.
Given the moderate size of the retailer, we will assume that they are not terribly concerned with finding rules that apply only to a specific brand of milk or detergent. With this in mind, all brand names can be removed from the purchases. This reduces the number of groceries to a more manageable 169 types, using broad categories such as chicken, frozen meals, margarine, and soda.
General considerations
There are two key points to do:
- Obtain the data and create the model
- Analyze the model
Technical implementation
Get the data & create the model
library(arules)
## Step 1. Get data
groceries <- read.transactions("groceries.csv", sep = ",")
## Step 2. Analyzing data
summary(groceries)
inspect(groceries[1:3])
itemFrequency(groceries[, 1:3])
## Step 3. Creating the model
groceryrules <- apriori(groceries, parameter = list(support = 0.006, confidence = 0.25, minlen = 2))
Analyze the model
## Step 4. Inspecting the model
inspect(groceryrules[1:3])
# lhs rhs support confidence lift count
# [1] {pot plants} => {whole milk} 0.006914082 0.4000000 1.565460 68
# [2] {pasta} => {whole milk} 0.006100661 0.4054054 1.586614 60
# [3] {herbs} => {root vegetables} 0.007015760 0.4312500 3.956477 69
inspect(sort(groceryrules, by = "lift") [1:3])
# lhs rhs support confidence lift count
# [1] {herbs} => {root vegetables} 0.007015760 0.4312500 3.956477 69
# [2] {berries} => {whipped/sour cream} 0.009049314 0.2721713 3.796886 89
# [3] {other vegetables,tropical fruit,whole milk} => {root vegetables} 0.007015760 0.4107143 3.768074 69
inspect(sort(groceryrules, by = "support") [1:3])
# lhs rhs support confidence lift count
# [1] {other vegetables} => {whole milk} 0.07483477 0.3867578 1.513634 736
# [2] {whole milk} => {other vegetables} 0.07483477 0.2928770 1.513634 736
# [3] {rolls/buns} => {whole milk} 0.05663447 0.3079049 1.205032 557
inspect(sort(groceryrules, by = "confidence") [1:3])
# lhs rhs support confidence lift count
# [1] {butter,whipped/sour cream} => {whole milk} 0.006710727 0.6600000 2.583008 66
# [2] {butter,yogurt} => {whole milk} 0.009354347 0.6388889 2.500387 92
# [3] {butter,root vegetables} => {whole milk} 0.008235892 0.6377953 2.496107 81
berryrules <- subset(groceryrules, items %in% "berries") inspect(berryrules) # lhs rhs support confidence lift count # [1] {berries} => {whipped/sour cream} 0.009049314 0.2721713 3.796886 89
# [2] {berries} => {yogurt} 0.010574479 0.3180428 2.279848 104
# [3] {berries} => {other vegetables} 0.010269446 0.3088685 1.596280 101
# [4] {berries} => {whole milk} 0.011794611 0.3547401 1.388328 116
Coming soon a detailed explanation about how the basket algorithm works
Bibliografia: