Counting transaction items

In this post I will show you how to count the number of items a user has selected (or bought, or triggered) within a data feed.

Transaction data can be a treasure trove of user behavior if you can get it into the right format for counting. I have put together a simple example below on how you can use R to get counts of user click behaviors at various levels of aggregation. Note: the data are entered in a sorted format. You could easily accomplish this by using the arrange command within dplyr.

# Setup example data
customer_id <- c("aid123","aid123","aid123", "aid123",
                 "aid432","aid432",
                 "aid887","aid887","aid887","aid887","aid887",
                 "aid98","aid98", "aid98")
visit_num <- c("1", "1", "1", "2",
               "1", "1",
               "1", "2", "2", "3", "4",
               "1", "1", "1")
item <- c("socks", "socks", "shoes", "shoes",
          "tires", "brakes", 
          "shoes","shoes", "shoes", "shoes", "shoes",
          "trimmer", "trimmer", "trimmer")
transactions <- data.frame(customer_id,visit_num, item)

library(dplyr) #Load dplyr here for the tbl_df call.
print(tbl_df(transactions), n = 14)
## # A tibble: 14 x 3
##    customer_id visit_num    item
##         <fctr>    <fctr>  <fctr>
##  1      aid123         1   socks
##  2      aid123         1   socks
##  3      aid123         1   shoes
##  4      aid123         2   shoes
##  5      aid432         1   tires
##  6      aid432         1  brakes
##  7      aid887         1   shoes
##  8      aid887         2   shoes
##  9      aid887         2   shoes
## 10      aid887         3   shoes
## 11      aid887         4   shoes
## 12       aid98         1 trimmer
## 13       aid98         1 trimmer
## 14       aid98         1 trimmer

Now that the data is set up, we can count items by visit number by visitor id and then a higher level of aggregation at just visitor id. You can imagine that if you had a date value the aggregation could by visitor id by date of which the result would be daily user behavior counts.

library(tidyr)
# Count items by visitor id and visit number.
purchases_v1 <- transactions %>% group_by(customer_id, visit_num) %>% 
  count(item) %>%
  spread(item, n, fill=0)

print(purchases_v1)
## # A tibble: 8 x 7
## # Groups:   customer_id, visit_num [8]
##   customer_id visit_num brakes shoes socks tires trimmer
## *      <fctr>    <fctr>  <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1      aid123         1      0     1     2     0       0
## 2      aid123         2      0     1     0     0       0
## 3      aid432         1      1     0     0     1       0
## 4      aid887         1      0     1     0     0       0
## 5      aid887         2      0     2     0     0       0
## 6      aid887         3      0     1     0     0       0
## 7      aid887         4      0     1     0     0       0
## 8       aid98         1      0     0     0     0       3

The table above accurately reflects the transaction data. Visitor aid123 bought 1 pair of shoes and 2 pair of socks on the first visit.

The follwing code will create a summary of purchases at the vistor level regardless of visit number.

# Count items by visitor id and visit number.
purchases_v2 <- transactions %>% group_by(customer_id) %>% 
  count(item) %>%
  spread(item, n, fill=0)

print(purchases_v2)
## # A tibble: 4 x 6
## # Groups:   customer_id [4]
##   customer_id brakes shoes socks tires trimmer
## *      <fctr>  <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1      aid123      0     2     2     0       0
## 2      aid432      1     0     0     1       0
## 3      aid887      0     5     0     0       0
## 4       aid98      0     0     0     0       3

We can now see that visitor aid887 had 5 shoe purchases that were spread over 4 visits but are now aggregated to reflect overall purchase behavior.