R Implementation of Bayesian Additive Semi-Structured Regression Trees (BAMDT).
Reference:
Luo, Z. T., Sang, H., & Mallick, B. (2022) BAMDT: Bayesian Additive Semi-Multivariate Decision Trees for Nonparametric Regression. Proceedings of the 39th International Conference on Machine Learning (ICML 2022) [link]
Demo.R: Demo code for fitting BAMDT on a U-shape domainTree.R: Class of semi-structured decision treesModel.R: Class of BAMDT modelsComplexDomainFun.R: Utility functions for U-shape domainSimData.R: Code to simulate data on a U-shape domain (no need to run unless you would like to regenerate data)input_U.RData: Data sets generated by runningSimData.R
The code depends on the following R packages: R6, collections, igraph, fdaPDE, BART, sf, ggplot2.
Please make sure they are installed before running the demo code.
model = Model$new(Y, X, graphs, projections, hyperpar, X_new, projections_new)creates a BAMDT model object named model.
Parameters:
-
Y: Numeric responses vector of lengthn. -
X: Numeric unstructured training features of sizen * p. -
graphs: List ofMspatial graphs, whereMis the number of trees. Each graph should be anigraphobject. -
projections: Integer matrix of sizen * M, whereprojections[i, j]is the nearest knot index corresponding to training observationifor treej. -
hyperpar: Named vector of hyperparameters with the following elements-
hyperpar['M']: Number of treesM. -
hyperpar['sigmasq_mu']: Variance of prior for$\mu$ , i.e.,$\sigma^2_\mu$ . -
hyperpar['q']: Quantile used to calibrate prior for noise variance$\sigma^2$ . -
hyperpar['nu']: Degree of freedom of the inverse-$\chi^2$ prior for noise variance$\sigma^2$ . -
hyperpar['alpha']: Hyperparameter$\alpha$ in tree generating process. -
hyperpar['beta']: Hyperparameter$\beta$ in tree generating process. -
hyperpar['numcut']: Number of candidate split points for unstructured features. -
hyperpar['prob_split_by_x']: Probability for performing a unstructured split.
-
-
X_new: Numeric unstructured test features of sizen_new * p. -
projections_new: Integer matrix of sizen_ho * M, whereprojections_new[i, j]is the nearest knot index corresponding to test observationifor treej.
To fit a BAMDT model and predict for test data, use
model$Fit(init_val, MCMC, BURNIN, THIN, seed = 1234, save_partitions = FALSE)Parameters:
-
init_val: Named list of initial values with the following element-
init_val[['sigmasq_y']]: Initial value for noise variance$\sigma^2$ .
-
-
MCMC: Number of MCMC iterations. -
BURNIN: Number of burn-in iterations. -
THIN: Retain MCMC samples everyTHINiterations, i.e., the number of posterior samples isnpost = (MCMC - BURNIN) / THIN. -
seed: Random seed. -
save_partition: Logical value indicating whether posterior samples of partitions are saved. Default isFALSE(recommended). Settingsave_partition = TRUEis highly memory inefficient.
The model object has the following public members:
-
model$sigmasq_y_out: Posterior samples of noise variance$\sigma^2$ . -
model$g_out:npost * n * Marray of posterior samples of (in-sample) fitted values from each tree. -
model$Y_new_out:npost * n_newmatrix of posterior samples of (out-of-sample) predicted values. -
model$importance_out:npost * (p + 1)matrix of posterior samples of feature importance metrics.