Iterative Bayesian Additive Regression Trees Descriptor Selection Method.
iBART
This is a R-Java implementation of iBART found in Ye, Senftle, & Li Operator-induced structural variable selection for identifying materials genes. This R package largely depends on the R package bartMachine
for its BART-G.SE variable selection implementation.
Installation
Before installing the iBART package in R, you first need to install Java JDK and rJava R package.
Install Java JDK (not JRE)
Download Java 17 JDK or above and install it properly. Then run R CMD javareconf
from the command line to configure Java in R. iBART requires bartMachine and rJava which require Java JDK; Java JRE won't work!
Install rJava
Run install.packages("rJava", INSTALL_opts = "--no-multriarch")
within R. To reproduce results in the paper, please install rJava 1.0-4
.
Install bartMachine
Run install.packages("bartMachine", INSTALL_opts = "--no-multiarch")
within R. To reproduce results in the paper, please install bartMachineJARs 1.1
and bartMachine 1.2.6
. If you experience error, please see the bartMachine repo for detailed instructions.
Install glmnet
Run install.packages("glmnet")
within R. To reproduce results in the paper, please install glmnet 4.1-1
.
Install iBART via devtools
Run devtools::install_github("mattsheng/iBART", INSTALL_opts = "--no-multriarch")
within R or run devtools::install_github("mattsheng/iBART", INSTALL_opts = "--no-multriarch", build_vignettes = TRUE)
if you want to build the vignettes; this will take a while.
Example
We use the simulation model in Section 3.4 of our paper to demonstrate the usage of iBART. Vignettes for real data application and simulation are available at here
set.seed(123)
options(java.parameters = "-Xmx10g") # Allocate 10GB of memory for Java
library(iBART)
n <- 250
p <- 10
X <- matrix(runif(n * p, min = -1, max = 1), nrow = n, ncol = p)
colnames(X) <- paste("x.", seq(from = 1, to = p, by = 1), sep = "")
y <- 15*(exp(X[,1])-exp(X[,2]))^2 + 20*sin(pi*X[,3]*X[,4])
+ rnorm(n, mean = 0, sd = 0.5)
iBART_results <- iBART(X = X, y = y,
head = colnames(X),
unit = NULL, # no unit information for simulation
opt = c("unary", "binary", "unary"), # unary operator first
sin_cos = TRUE, # add sin and cos to operator set
apply_pos_opt_on_neg_x = FALSE, # e.g. do not apply log() on negative x
Lzero = TRUE, # best subset selection
K = 4, # at most 4 predictors in best subset model
standardize = FALSE, # don't standardize input matrix X
seed = 99)
# > Start iBART descriptor generation and selection...
# > Iteration 1
# > iBART descriptor selection...
# > avg..........null..................................................
# > Constructing descriptors using unary operators...
# > Iteration 2
# > iBART descriptor selection...
# > avg..........null..................................................
# > Constructing descriptors using binary operators...
# > Iteration 3
# > iBART descriptor selection...
# > avg..........null..................................................
# > Constructing descriptors using unary operators...
# > BART iteration done!
# > LASSO descriptor selection...
# > L-zero regression...
# > Total time: 261.336249113083 secs
# Correct descriptor names are (exp(x.1)-exp(x.2))^2 and sin(pi*x.3*x.4)
iBART_results$descriptor_names
# > [1] "(exp(x.1)-exp(x.2))^2" "sin(pi*(x.3*x.4))"
R Session Info
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] glmnet_4.1-1 Matrix_1.3-4 bartMachine_1.2.6
[4] missForest_1.4 itertools_0.1-3 iterators_1.0.13
[7] foreach_1.5.1 randomForest_4.6-14 bartMachineJARs_1.1
[10] rJava_1.0-4
loaded via a namespace (and not attached):
[1] lattice_0.20-44 codetools_0.2-18 grid_4.0.5 splines_4.0.5
[5] tools_4.0.5 survival_3.2-11 parallel_4.0.5 compiler_4.0.5
[9] shape_1.4.6