Thanks to visit codestin.com
Credit goes to github.com

Skip to content

mattsheng/BartVC

Repository files navigation

BartVC

BartVC implements the VC-measure based nonparametric variable selection method introduce in arxiv. The current package utilizes Dirichlet Additive Regression Trees (DART) as the nonparametric model, though other Bayesian tree ensemble models can also be used. We refer to VC-measure with DART backend as DART VC-measure.

Backends

The BartVC R package currently supports two flavors of DART implementations:

Installation

You can install the development version of BartVC from GitHub with:

# install.packages("devtools")
devtools::install_github("mattsheng/BartVC")

Example

We demonstrate DART VC-measure using the Friedman equation:

$$y = 10\sin(\pi x_1x_2) + 20(x_3-0.5)^2 + 10x_4 + 5x_5 + \varepsilon,$$

where $\varepsilon \sim N(0, 2^2)$. Predictors $x_1,\ldots,x_{100}$ are sampled iid from an Uniform(0, 1) distribution. The goal is to identify the truly relevant predictors, namely $(x_1,x_2,x_3,x_4,x_5)$, from all $p=100$ predictors.

To use the dartMachine backend, we must allocate memory for Java before loading the BartVC package:

# Must allocate memory before loading `BartVC` package when using `dartMachine` backend
# Here I allocated 5GB of memory for Java
options(java.parameters = c("-Xmx5g"))
library(BartVC)

set.seed(123)
n <- 1000
p <- 100
Lrep <- 10

# Generate data (y, X)
X <- matrix(runif(n * p), n, p)
y_mu <- 10 * sin(pi * X[, 1] * X[, 2]) + 20 * (X[, 3] - 0.5)^2 + 10 * X[, 4] + 5 * X[, 5]
eps <- rnorm(n, mean = 0, sd = 2)
y <- y_mu + eps

# Perform DART VC-measure using "dartMachine" backend
VC_result <- DartVC(y = y, 
                    X = X, 
                    seed = 123, 
                    Lrep = Lrep, 
                    backend = "dartMachine")

# Examine the selected predictors
VC_result$pos_idx
#> [1] 1 2 3 4 5

If we want to use the BART backend instead, simply replace backend = "dartMachine" with backend = "BART":

VC_result <- DartVC(y = y, 
                    X = X, 
                    seed = 123, 
                    Lrep = Lrep, 
                    backend = "BART")
VC_result$pos_idx
#> [1] 1 2 3 4 5

Reproduce paper results

Submit simulations

Experiments for each method can be launched by running the bash script files from the experiment/ folder. For instance, running

bash submit_DART_VC-measure.sh

within the experiment/ folder will submit all simulations for the DART VC-measure method. Raw results, e.g., VC and VC Rank matrices for DART VC-measure, will be stored as JSON files under the results/ folder.

Collect raw results

When all raw simulation results are available, simply run the Jupyter notebook postprocessing/json_collection.ipynb to collect all JSON files and prepare them for the variable selection step. Collected results will be stored in results/no_idx/ as .feather files.

Perform variable selection

The Jupyter notebook postprocessing/variable_selection.ipynb performs variable selection for some of the methods, e.g., the original DART, DART VC-measure, etc., and calculates the selection accuracy for all methods. Selection accuracy results and the selected variables for each simulation and method are stored as .feather files under results/with_idx/

Reproduce paper figures

R functions in the postprocessing/ folders reproduce all figures in the paper. For instance, postprocessing/fig3_12_13.R reproduces Figures 3, 12, and 13 in the paper.

About

No description, website, or topics provided.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published