In this post I will show you how to count the number of items a user has selected (or bought, or triggered) within a data feed.
Transaction data can be a treasure trove of user behavior if you can get it into the right format for counting. I have put together a simple example below on how you can use R to get counts of user click behaviors at various levels of aggregation. Note: the data are entered in a sorted format. You could easily accomplish this by using the arrange command within dplyr.
# Setup example data
customer_id <- c("aid123","aid123","aid123", "aid123",
"aid432","aid432",
"aid887","aid887","aid887","aid887","aid887",
"aid98","aid98", "aid98")
visit_num <- c("1", "1", "1", "2",
"1", "1",
"1", "2", "2", "3", "4",
"1", "1", "1")
item <- c("socks", "socks", "shoes", "shoes",
"tires", "brakes",
"shoes","shoes", "shoes", "shoes", "shoes",
"trimmer", "trimmer", "trimmer")
transactions <- data.frame(customer_id,visit_num, item)
library(dplyr) #Load dplyr here for the tbl_df call.
print(tbl_df(transactions), n = 14)
## # A tibble: 14 x 3
## customer_id visit_num item
## <fctr> <fctr> <fctr>
## 1 aid123 1 socks
## 2 aid123 1 socks
## 3 aid123 1 shoes
## 4 aid123 2 shoes
## 5 aid432 1 tires
## 6 aid432 1 brakes
## 7 aid887 1 shoes
## 8 aid887 2 shoes
## 9 aid887 2 shoes
## 10 aid887 3 shoes
## 11 aid887 4 shoes
## 12 aid98 1 trimmer
## 13 aid98 1 trimmer
## 14 aid98 1 trimmer
Now that the data is set up, we can count items by visit number by visitor id and then a higher level of aggregation at just visitor id. You can imagine that if you had a date value the aggregation could by visitor id by date of which the result would be daily user behavior counts.
library(tidyr)
# Count items by visitor id and visit number.
purchases_v1 <- transactions %>% group_by(customer_id, visit_num) %>%
count(item) %>%
spread(item, n, fill=0)
print(purchases_v1)
## # A tibble: 8 x 7
## # Groups: customer_id, visit_num [8]
## customer_id visit_num brakes shoes socks tires trimmer
## * <fctr> <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 aid123 1 0 1 2 0 0
## 2 aid123 2 0 1 0 0 0
## 3 aid432 1 1 0 0 1 0
## 4 aid887 1 0 1 0 0 0
## 5 aid887 2 0 2 0 0 0
## 6 aid887 3 0 1 0 0 0
## 7 aid887 4 0 1 0 0 0
## 8 aid98 1 0 0 0 0 3
The table above accurately reflects the transaction data. Visitor aid123 bought 1 pair of shoes and 2 pair of socks on the first visit.
The follwing code will create a summary of purchases at the vistor level regardless of visit number.
# Count items by visitor id and visit number.
purchases_v2 <- transactions %>% group_by(customer_id) %>%
count(item) %>%
spread(item, n, fill=0)
print(purchases_v2)
## # A tibble: 4 x 6
## # Groups: customer_id [4]
## customer_id brakes shoes socks tires trimmer
## * <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 aid123 0 2 2 0 0
## 2 aid432 1 0 0 1 0
## 3 aid887 0 5 0 0 0
## 4 aid98 0 0 0 0 3
We can now see that visitor aid887 had 5 shoe purchases that were spread over 4 visits but are now aggregated to reflect overall purchase behavior.