tidytable

The goal of tidytable is to be a tidy interface to data.table.

Why `tidytable`?

tidyverse-like syntax with data.table speed
rlang compatibility - See here
Includes functions that dtplyr is missing, including many tidyr functions

Note: tidytable functions do not use data.table’s modify-by-reference, and instead use the copy-on-modify principles followed by the tidyverse and base R.

Installation

Install the released version from CRAN with:

install.packages("tidytable")

Or install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("markfairbanks/tidytable")

tidytable functions

Enhanced selection support denoted by ES See examples here

tidytable helpers

dt(): Pipeable data.table syntax. See here
dt_get_dummies()
%notin%

dplyr

Core verbs

dt_arrange()
dt_filter()
dt_mutate(): _if()/_at()/_all()/_across() - ES
- The _across() helper is new and can replace both _if() and _at() See here
dt_select() - ES
dt_summarize(): Group by specifications called inside. See ?dt_summarize

Other dplyr functions

dt_bind_cols() & dt_bind_rows()
dt_case(): Similar to dplyr::case_when(). See ?dt_case() for syntax
dt_count() - ES
dt_distinct() - ES
Joins: dt_left_join(), dt_inner_join(), dt_right_join(), dt_full_join(), & dt_anti_join()
dt_pull()
dt_relocate()
dt_rename(): _if()/_at()/_all()/_across() - ES
Select helpers: dt_starts_with(), dt_ends_with(), dt_contains(), dt_everything()
dt_slice(): _head()/_tail()/_max()/_min()
- The slice_*() helpers are like dt_top_n(), but are slightly easier to use
dt_top_n()

tidyr

dt_drop_na() - ES
dt_fill(): Works on character/factor/logical types (data.table::nafill() does not) - ES
dt_group_split() - ES
Nesting: dt_group_nest() - ES & dt_unnest_legacy()
dt_pivot_longer() - ES & dt_pivot_wider() - ES
dt_replace_na()
dt_separate()

purrr

dt_map(), dt_map2(), dt_map_*() variants, & dt_map2_*() variants

General syntax

The code chunk below shows the tidytable syntax:

library(data.table)
library(tidytable)

example_dt <- data.table(x = c(1,2,3), y = c(4,5,6), z = c("a","a","b"))

example_dt %>%
  dt_select(x, y, z) %>%
  dt_filter(x < 4, y > 1) %>%
  dt_arrange(x, y) %>%
  dt_mutate(double_x = x * 2,
            double_y = y * 2)
#>        x     y     z double_x double_y
#>    <dbl> <dbl> <chr>    <dbl>    <dbl>
#> 1:     1     4     a        2        8
#> 2:     2     5     a        4       10
#> 3:     3     6     b        6       12

Using “group by”

Group by calls are done from inside any function that has group by functionality (e.g. dt_summarize() & dt_mutate())

A single column can be passed with by = z
Multiple columns can be passed with by = list(y, z)

example_dt %>%
  dt_summarize(avg_x = mean(x),
               count = .N,
               by = z)
#>        z avg_x count
#>    <chr> <dbl> <int>
#> 1:     a   1.5     2
#> 2:     b   3.0     1

Enhanced selection

Enhanced selection allows you to mix predicates like is.double with normal selection. Some examples:

example_dt <- data.table(a = c(1,2,3),
                         b = c(4,5,6),
                         c = c("a","a","b"),
                         d = c("a","b","c"))

example_dt %>%
  dt_select(is.numeric, d)
#>        a     b     d
#>    <dbl> <dbl> <chr>
#> 1:     1     4     a
#> 2:     2     5     b
#> 3:     3     6     c

You can also use this format to drop columns:

example_dt %>%
  dt_select(-is.numeric)
#>        c     d
#>    <chr> <chr>
#> 1:     a     a
#> 2:     a     b
#> 3:     b     c

Currently supported: is.numeric/is.integer/is.double/is.character/is.factor

New helper: `dt_mutate_across()`

Enhanced selection allows the user to replace dt_mutate_if() & dt_mutate_at() with one helper - dt_mutate_across().

Using _across() instead of _if():

example_dt <- data.table(a = c(1,1,1),
                         b = c(1,1,1),
                         c = c("a","a","b"),
                         d = c("a","b","c"))

example_dt %>%
  dt_mutate_across(is.numeric, as.character)
#>        a     b     c     d
#>    <chr> <chr> <chr> <chr>
#> 1:     1     1     a     a
#> 2:     1     1     a     b
#> 3:     1     1     b     c

Using _across() instead of _at():

example_dt %>%
  dt_mutate_across(c(a, b), ~ .x + 1)
#>        a     b     c     d
#>    <dbl> <dbl> <chr> <chr>
#> 1:     2     2     a     a
#> 2:     2     2     a     b
#> 3:     2     2     b     c

These two approaches can be combined in one call:

example_dt <- data.table(dbl_col1 = c(1.0,1.0,1.0),
                         dbl_col2 = c(1.0,1.0,1.0),
                         int_col1 = c(1L,1L,1L),
                         int_col2 = c(1L,1L,1L),
                         char_col1 = c("a","a","a"),
                         char_col2 = c("b","b","b"))

example_dt %>%
  dt_mutate_across(c(is.double, int_col1), ~ .x + 1)
#>    dbl_col1 dbl_col2 int_col1 int_col2 char_col1 char_col2
#>       <dbl>    <dbl>    <dbl>    <int>     <chr>     <chr>
#> 1:        2        2        2        1         a         b
#> 2:        2        2        2        1         a         b
#> 3:        2        2        2        1         a         b

`rlang` compatibility

rlang quoting/unquoting can be used to write custom functions with tidytable functions.

Note that quosures are not compatible with data.table, so enexpr() must be used instead of enquo().

Custom function with `dt_mutate()`

library(rlang)

example_dt <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))

add_one <- function(.data, new_name, add_col) {
  new_name <- enexpr(new_name)
  add_col <- enexpr(add_col)
  
  .data %>%
    dt_mutate(!!new_name := !!add_col + 1)
}

example_dt %>%
  add_one(x_plus_one, x)
#>        x     y     z x_plus_one
#>    <dbl> <dbl> <chr>      <dbl>
#> 1:     1     1     a          2
#> 2:     1     1     a          2
#> 3:     1     1     b          2

Custom function with `dt_summarize()`

example_df <- data.table(x = 1:10, y = c(rep("a", 6), rep("b", 4)), z = c(rep("a", 6), rep("b", 4)))

find_mean <- function(.data, grouping_cols, col) {
  grouping_cols <- enexpr(grouping_cols)
  col <- enexpr(col)
  
  .data %>%
    dt_summarize(avg = mean(!!col),
                 by = !!grouping_cols)
}

example_df %>%
  find_mean(grouping_cols = list(y, z), col = x)
#>        y     z   avg
#>    <chr> <chr> <dbl>
#> 1:     a     a   3.5
#> 2:     b     b   8.5

`dt()` helper

The dt() function makes regular data.table syntax pipeable, so you can easily mix tidytable syntax with data.table syntax:

example_dt <- data.table(x = c(1,2,3), y = c(4,5,6), z = c("a", "a", "b"))

example_dt %>%
  dt(, list(x, y, z)) %>%
  dt(x < 4 & y > 1) %>%
  dt(order(x, y)) %>%
  dt(, ':='(double_x = x * 2,
            double_y = y * 2)) %>%
  dt(, list(avg_x = mean(x)), by = z)
#>        z avg_x
#>    <chr> <dbl>
#> 1:     a   1.5
#> 2:     b   3.0

Speed Comparisons

Below are some speed comparisons of various functions. More functions will get added to the speed comps over time.

A few notes:

Comparing times from separate functions won’t be very useful. For example - the summarize() tests were performed on a different dataset from case_when().
setDTthreads(4) was used for data.table & tidytable timings.
Modify-by-reference was used in data.table when being compared to dt_mutate() & dplyr::mutate()
dt_fill() & tidyr::fill() both work with character/factor/logical columns, whereas data.table::nafill() does not. Testing only included numeric columns due to this constraint.
Currently data.table doesn’t have its own case_when() translation, so a multiple nested fifelse() was used.
All tests can be found in the source code of the README.
pandas comparisons are in the process of being added - more will be added soon.
Lastly I’d like to mention that these tests were not rigorously created to cover all angles equally. They are just meant to be used as general insight into the performance of these packages.

all_marks
#> # A tibble: 13 x 6
#>    function_tested tidyverse tidytable data.table pandas tidytable_vs_tidyverse
#>    <chr>           <chr>     <chr>     <chr>      <chr>  <chr>                 
#>  1 arrange         393.5ms   34.2ms    38.7ms     297ms  8.7%                  
#>  2 case_when       517ms     179ms     163ms      307ms  34.6%                 
#>  3 distinct        86.4ms    18.4ms    16.3ms     287ms  21.3%                 
#>  4 fill            127.9ms   36.3ms    32.5ms     146ms  28.4%                 
#>  5 filter          283ms     222ms     222ms      656ms  78.4%                 
#>  6 inner_join      70.9ms    60.2ms    59.7ms     <NA>   84.9%                 
#>  7 left_join       66.8ms    46.2ms    48.8ms     <NA>   69.2%                 
#>  8 mutate          69.3ms    53.9ms    74.4ms     85.2ms 77.8%                 
#>  9 nest            59ms      14.7ms    11.3ms     <NA>   24.9%                 
#> 10 pivot_longer    186.6ms   37ms      30.7ms     <NA>   19.8%                 
#> 11 pivot_wider     904ms     219ms     219ms      <NA>   24.2%                 
#> 12 summarize       483ms     176ms     160ms      780ms  36.4%                 
#> 13 unnest          181.01ms  8.18ms    7.88ms     <NA>   4.5%

Name		Name	Last commit message	Last commit date
Latest commit History 613 Commits
R		R
docs		docs
man		man
pkgdown/favicon		pkgdown/favicon
tests		tests
.DS_Store		.DS_Store
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
cran-comments.md		cran-comments.md
tidytable.Rproj		tidytable.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

tidytable

Why `tidytable`?

Installation

tidytable functions

tidytable helpers

dplyr

Core verbs

Other dplyr functions

tidyr

purrr

General syntax

Using “group by”

Enhanced selection

New helper: `dt_mutate_across()`

`rlang` compatibility

Custom function with `dt_mutate()`

Custom function with `dt_summarize()`

`dt()` helper

Speed Comparisons

About

Licenses found

Uh oh!

Releases

Packages

Languages

License

Licenses found

TylerGrantSmith/tidytable

Folders and files

Latest commit

History

Repository files navigation

tidytable

Why tidytable?

Installation

tidytable functions

tidytable helpers

dplyr

Core verbs

Other dplyr functions

tidyr

purrr

General syntax

Using “group by”

Enhanced selection

New helper: dt_mutate_across()

rlang compatibility

Custom function with dt_mutate()

Custom function with dt_summarize()

dt() helper

Speed Comparisons

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Why `tidytable`?

New helper: `dt_mutate_across()`

`rlang` compatibility

Custom function with `dt_mutate()`

Custom function with `dt_summarize()`

`dt()` helper

Packages