The issue is of human time rather than silicon chip time. Human time can be wasted by taking longer to write the code, and (often much more importantly) by taking more time to understand subsequently what it does.
– The R Inferno (Patrick Burns, 2011)
The goal of mutagen is to provide extensions to dplyr’s
mutate().
mutagen provides simple-to-use functions as alternatives to complex R
idioms for variable generation. Some mutagen functions are specific to
problems encountered in R (e.g., working with list-columns in a data
frame), while others solve more generic data science operations that are
inspired by the excellent set of
egen (‘extensions to
generate’) and egenmore functions in Stata.
You can install the development version of mutagen from GitHub with:
# install.packages("devtools")
devtools::install_github("gvelasq/mutagen")mutagen functions begin with the prefix gen_* and are designed to be
used inside dplyr’s mutate(). A mnemonic for this is that to use
mutagen, first mutate then generate.
| mutagen function | R idiom1 | Stata function |
|---|---|---|
gen_coldiff() |
as.integer(any(imap_lgl(data[-1], \(x, idx) !identical(data[[1]], data[[idx]])))) |
egen diff() |
gen_colpercent() |
mutate(data, pct = col / sum(col) * 100, .by = group_cols) |
egen pc() |
gen_listcol_na()2 |
modify_tree(leaf = \(x) replace(x, is.null(x), NA)) |
N/A |
gen_rowall()2 |
pmap_int(data, \(cols) all(list(cols) %in% values)) |
egenmore rall() |
gen_rowany()2 |
pmap_int(data, \(cols) any(list(cols) %in% values)) |
egen anymatch(), egenmore rany() |
gen_rowcount()2 |
pmap_int(data, \(cols) sum(list(cols) %in% values)) |
egen anycount(), egenmore rcount() |
gen_rowfirst()2 |
pmap_vec(data, \(cols) first(c(cols), na_rm = TRUE)) |
egen rowfirst() |
gen_rowlast()2 |
pmap_vec(data, \(cols) last(c(cols), na_rm = TRUE)) |
egen rowlast() |
gen_rowmax() |
inject(pmax(!!!data, na.rm = TRUE)) |
egen rowmax() |
gen_rowmean()2 |
pmap_dbl(data, \(cols) mean(c(cols), na.rm = TRUE)) |
egen rowmean() |
gen_rowmedian()2 |
pmap_dbl(data, \(cols) median(c(cols), na.rm = TRUE)) |
egen rowmedian() |
gen_rowmin() |
inject(pmin(!!!data, na.rm = TRUE)) |
egen rowmin() |
gen_rowmiss()2 |
pmap_int(data, \(cols) sum(!complete.cases(c(cols)))) |
egen rowmiss() |
gen_rownonmiss()2 |
pmap_int(data, \(cols) sum(complete.cases(c(cols)))) |
egen rownonmiss() |
gen_rownth()2 |
pmap_vec(data, \(cols) nth(c(cols), n, na_rm = TRUE)) |
N/A |
gen_rowsum()2 |
pmap_vec(data, \(cols) sum(cols, na.rm = TRUE)) |
egen rowtotal(), egenmore rsum2() |
gen_rowsd()2 |
pmap_dbl(data, \(cols) sd(c(cols), na.rm = TRUE)) |
egen rowsd() |
Please note that the mutagen project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
All contributions to this project are gratefully acknowledged using the
allcontributors package
following the all-contributors
specification. Contributions of any kind are welcome!
|
gvelasq |
ivelasq |
|
delabj |
thomascwells |