Description
Use Raw Vectors to Minimize Memory Consumption of Factors.
Description
Uses raw vectors to minimize memory consumption of categorical variables with fewer than 256 unique values. Useful for analysis of large datasets involving variables such as age, years, states, countries, or education levels.
README.md
factor256
The goal of factor256 is to minimize the memory footprint of data analysis that uses categorical variables with fewer than 256 unique values.
Installation
You can install the development version of factor256 from GitHub with:
# install.packages("devtools")
devtools::install_github("HughParsonage/factor256")
Example
This is a basic example which shows you how to solve a common problem:
library(factor256)
x <- factor256(LETTERS)
typeof(x)
#> [1] "raw"
identical(recompose256(x), LETTERS)
#> [1] TRUE
library(data.table)
DT <-
CJ(Year = 2000:2020,
State = rep_len(c("WA", "SA", "NSW", "NT", "TAS", "VIC", "QLD"), 1000),
Age = rep_len(0:100, 10000))
# pryr::object_size(DT)
# 3.36GB
for (j in seq_along(DT)) {
set(DT, j = j, value = factor256(.subset2(DT, j)))
}
# pryr::object_size(DT)
# 630 MB