Retail Shopping Data.
completejourney
Overview
An R data package that provides access to data in the Complete Journey Study provided by 84.51°. The data represents grocery store shopping transactions over one year from a group of 2,469 households who are frequent shoppers at a retailer. It contains all of each household’s purchases, not just those from a limited number of categories. For certain households, demographic information as well as direct marketing contact history are included.
campaigns
: campaigns received by each householdcampaign_descriptions
: campaign metadata (length of time active)coupons
: coupon metadata (UPC code, campaign, etc.)coupon_redemptions
: coupon redemptions (household, day, UPC code, campaign)demographics
: household demographic data (age, income, family size, etc.)products
: product metadata (brand, description, etc.)promotions_sample
: a sampling of the product placement in mailers and in stores corresponding to advertising campaignstransactions_sample
: a sampling of the products purchased by households
Installation
install.packages("completejourney")
To get a bug fix, or use a feature from the development version, you can install completejourney
from GitHub with:
# install.packages("remotes")
remotes::install_github("bradleyboehmke/completejourney")
Downloading full data sets
Due to the size of the transactions and promotions data, the package provides a sampling of the data built-in with transactions_sample
and promotions_sample
. However, you can access the full promotions and transactions data sets from the source GitHub repository with the following:
library(completejourney)
# get the full transactions data set
transactions <- get_transactions()
transactions
## # A tibble: 1,469,307 x 11
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 900 330 31198570… 1095275 1 0.5 0
## 2 900 330 31198570… 9878513 1 0.99 0.1
## 3 1228 406 31198655… 1041453 1 1.43 0.15
## 4 906 319 31198705… 1020156 1 1.5 0.290
## 5 906 319 31198705… 1053875 2 2.78 0.8
## 6 906 319 31198705… 1060312 1 5.49 0.5
## 7 906 319 31198705… 1075313 1 1.5 0.290
## 8 1058 381 31198676… 985893 1 1.88 0.21
## 9 1058 381 31198676… 988791 1 1.5 1.29
## 10 1058 381 31198676… 9297106 1 2.69 0
## # … with 1,469,297 more rows, and 4 more variables: coupon_disc <dbl>,
## # coupon_match_disc <dbl>, week <int>, transaction_timestamp <dttm>
# get the full promotions data set
promotions <- get_promotions()
promotions
## # A tibble: 20,940,529 x 5
## product_id store_id display_location mailer_location week
## <chr> <chr> <fct> <fct> <int>
## 1 1000050 316 9 0 1
## 2 1000050 337 3 0 1
## 3 1000050 441 5 0 1
## 4 1000092 292 0 A 1
## 5 1000092 293 0 A 1
## 6 1000092 295 0 A 1
## 7 1000092 298 0 A 1
## 8 1000092 299 0 A 1
## 9 1000092 304 0 A 1
## 10 1000092 306 0 A 1
## # … with 20,940,519 more rows
# a convenience function to get both
c(promotions, transactions) %<-% get_data(which = 'both', verbose = FALSE)
dim(promotions)
## [1] 20940529 5
dim(transactions)
## [1] 1469307 11
Learn more
Learn more about the completejourney data, and the type of insights you can look for, at http://bit.ly/completejourney.
Source
The Complete Journey data is available at: http://www.8451.com/area51/.