Tidy Interface to 'data.table'.
tidytable
tidytable
is a data frame manipulation library for users who need data.table
speed but prefer tidyverse
-like syntax.
Installation
Install the released version from CRAN with:
install.packages("tidytable")
Or install the development version from GitHub with:
# install.packages("pak")
pak::pak("markfairbanks/tidytable")
General syntax
tidytable
replicates tidyverse
syntax but uses data.table
in the background. In general you can simply use library(tidytable)
to replace your existing dplyr
and tidyr
code with data.table
backed equivalents.
A full list of implemented functions can be found here.
library(tidytable)
df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df %>%
select(x, y, z) %>%
filter(x < 4, y > 1) %>%
arrange(x, y) %>%
mutate(double_x = x * 2,
x_plus_y = x + y)
#> # A tidytable: 3 × 5
#> x y z double_x x_plus_y
#> <int> <int> <chr> <dbl> <int>
#> 1 1 4 a 2 5
#> 2 2 5 a 4 7
#> 3 3 6 b 6 9
Applying functions by group
You can use the normal tidyverse
group_by()
/ungroup()
workflow, or you can use .by
syntax to reduce typing. Using .by
in a function is shorthand for df %>% group_by() %>% some_function() %>% ungroup()
.
- A single column can be passed with
.by = z
- Multiple columns can be passed with
.by = c(y, z)
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)
df %>%
summarize(avg_z = mean(z),
.by = c(x, y))
#> # A tidytable: 2 × 3
#> x y avg_z
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
All functions that can operate by group have a .by
argument built in. (mutate()
, filter()
, summarize()
, etc.)
The above syntax is equivalent to:
df %>%
group_by(x, y) %>%
summarize(avg_z = mean(z)) %>%
ungroup()
#> # A tidytable: 2 × 3
#> x y avg_z
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
Both options are available for users, so you can use the syntax that you prefer.
tidyselect support
tidytable
allows you to select/drop columns just like you would in the tidyverse by utilizing the tidyselect
package in the background.
Normal selection can be mixed with all tidyselect
helpers: everything()
, starts_with()
, ends_with()
, any_of()
, where()
, etc.
df <- data.table(
a = 1:3,
b1 = 4:6,
b2 = 7:9,
c = c("a", "a", "b")
)
df %>%
select(a, starts_with("b"))
#> # A tidytable: 3 × 3
#> a b1 b2
#> <int> <int> <int>
#> 1 1 4 7
#> 2 2 5 8
#> 3 3 6 9
A full overview of selection options can be found here.
Using tidyselect in .by
tidyselect
helpers also work when using .by
:
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)
df %>%
summarize(avg_z = mean(z),
.by = where(is.character))
#> # A tidytable: 2 × 3
#> x y avg_z
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
Tidy evaluation compatibility
Tidy evaluation can be used to write custom functions with tidytable
functions. The embracing shortcut {{ }}
works, or you can use enquo()
with !!
if you prefer:
df <- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))
add_one <- function(data, add_col) {
data %>%
mutate(new_col = {{ add_col }} + 1)
}
df %>%
add_one(x)
#> # A tidytable: 3 × 4
#> x y z new_col
#> <dbl> <int> <chr> <dbl>
#> 1 1 4 a 2
#> 2 1 5 a 2
#> 3 1 6 b 2
The .data
and .env
pronouns also work within tidytable
functions:
var <- 10
df %>%
mutate(new_col = .data$x + .env$var)
#> # A tidytable: 3 × 4
#> x y z new_col
#> <dbl> <int> <chr> <dbl>
#> 1 1 4 a 11
#> 2 1 5 a 11
#> 3 1 6 b 11
A full overview of tidy evaluation can be found here.
dt()
helper
The dt()
function makes regular data.table
syntax pipeable, so you can easily mix tidytable
syntax with data.table
syntax:
df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df %>%
dt(, .(x, y, z)) %>%
dt(x < 4 & y > 1) %>%
dt(order(x, y)) %>%
dt(, double_x := x * 2) %>%
dt(, .(avg_x = mean(x)), by = z)
#> # A tidytable: 2 × 2
#> z avg_x
#> <chr> <dbl>
#> 1 a 1.5
#> 2 b 3
Speed Comparisons
For those interested in performance, speed comparisons can be found here.
Acknowledgements
tidytable
is only possible because of the great contributions to R by the data.table
and tidyverse
teams. data.table
is used as the main data frame engine in the background, while tidyverse
packages like rlang
, vctrs
, and tidyselect
are heavily relied upon to give users an experience similar to dplyr
and tidyr
.