tidyhte
provides tidy semantics for estimation of heterogeneous
treatment effects through the use of Kennedy’s (n.d.) doubly-robust
learner.
The goal of tidyhte
is to use a sort of “recipe” design. This should
(hopefully) make it extremely easy to scale an analysis of HTE from the
common single-outcome / single-moderator case to many outcomes and many
moderators. The configuration of tidyhte
should make it extremely easy
to perform the same analysis across many outcomes and for a wide-array
of moderators. It’s written to be fairly easy to extend to different
models and to add additional diagnostics and ways to output information
from a set of HTE estimates.
The best place to start for learning how to use tidyhte
are the
vignettes which runs through example analyses from start to finish:
vignette("experimental_analysis")
and
vignette("observational_analysis")
. There is also a writeup
summarizing the method and implementation in
vignette("methodological-details")
.
You will be able to install the released version of tidyhte from CRAN with:
install.packages("tidyhte")
But this does not yet exist. In the meantime, install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("ddimmery/tidyhte")
To set up a simple configuration, it’s straightforward to use the Recipe API:
library(tidyhte)
library(dplyr)
basic_config() %>%
add_propensity_score_model("SL.glmnet") %>%
add_outcome_model("SL.glmnet") %>%
add_moderator("Stratified", x1, x2) %>%
add_moderator("KernelSmooth", x3) %>%
add_vimp(sample_splitting = FALSE) -> hte_cfg
The basic_config
includes a number of defaults: it starts off the
SuperLearner ensembles for both treatment and outcome with linear models
("SL.glm"
)
data %>%
attach_config(hte_cfg) %>%
make_splits(userid, .num_splits = 12) %>%
produce_plugin_estimates(
outcome_variable,
treatment_variable,
covariate1, covariate2, covariate3, covariate4, covariate5, covariate6
) %>%
construct_pseudo_outcomes(outcome_variable, treatment_variable) -> data
data %>%
estimate_QoI(covariate1, covariate2) -> results
To get information on estimate CATEs for a moderator not included previously would just require rerunning the final line:
data %>%
estimate_QoI(covariate3) -> results
Replicating this on a new outcome would be as simple as running the following, with no reconfiguration necessary.
data %>%
attach_config(hte_cfg) %>%
produce_plugin_estimates(
second_outcome_variable,
treatment_variable,
covariate1, covariate2, covariate3, covariate4, covariate5, covariate6
) %>%
construct_pseudo_outcomes(second_outcome_variable, treatment_variable) %>%
estimate_QoI(covariate1, covariate2) -> results
This leads to the ability to easily chain together analyses across many outcomes in an easy way:
library("foreach")
data %>%
attach_config(hte_cfg) %>%
make_splits(userid, .num_splits = 12) -> data
foreach(outcome = list_of_outcomes, .combine = "bind_rows") %do% {
data %>%
produce_plugin_estimates(
outcome,
treatment_variable,
covariate1, covariate2, covariate3, covariate4, covariate5, covariate6
) %>%
construct_pseudo_outcomes(outcome, treatment_variable) %>%
estimate_QoI(covariate1, covariate2) %>%
mutate(outcome = rlang::as_string(outcome))
}
The function estimate_QoI
returns results in a tibble format which
makes it easy to manipulate or plot results.