Bayesian Framework for Computational Modeling.
bfw: Bayesian Framework for Computational Modeling
What is bfw?
The purpose of bfw
is to establish a framework for conducting Bayesian analysis in R, using MCMC and JAGS (Plummer, 2003). The framework provides several modules to conduct linear and non-linear (hierarchical) analyses, and allows the use of custom functions and complex JAGS models.
Derived from the excellent work of Kruschke (2015), the goal of the framework is to easily estimate parameter values and the stability of estimates from the highest density interval (HDI), make null value assessment through region of practical equivalence testing (ROPE) and conduct convergence diagnostics (e.g., Gelman & Rubin, 1992).
Users are encouraged to apply justified priors by modifying existing JAGS models found in extdata/models
or by adding custom models. Similarly, one might modify models to conduct posterior predictive checks (see Kruschke, 2013). The purpose of the framework is not to provide generic modules suitable for all circumstances, but rather act as a platform for modifying or developing models for a given project.
List of current modules
- Bernoulli trials
- Covariate estimations (including correlation and Cronbach’s alpha)
- Fit observed and latent data (e.g., SEM, CFA, mediation models)
- Bayesian equivalent of Cohen’s kappa
- Mean and standard deviation estimations
- Predict metric values (cf., ANOVA)
- Predict nominal values (cf., chi-square test)
- Simple, multiple and hierarchical regression
- Softmax regression (i.e., multinomial logistic regression)
List of current visualizations
- Plot density of parameter values (including ROPE)
- Plot mean data (including repeated measures)
- Plot nominal data (e.g., expected and observed values)
- Plot circlize data (e.g., multiple response categories)
Prerequisites
- JAGS (>=4.3.0): https://mcmc-jags.sourceforge.io
- Java JDK (>=1.4): https://www.java.com/en/download/manual.jsp
Dependencies
Dependencies are automatically installed from CRAN. By default, outdated dependencies are automatically upgraded.
Installing
You can install bfw
from GitHub. If you already have a previous version of bfw
installed, using the command below will update to the latest development version.
Development version (GitHub)
devtools::install_github("oeysan/bfw")
Please note that stable versions are hosted at CRAN, whereas GitHub versions are in active development.
Stable version (CRAN)
utils::install.packages("bfw")
Issues
Please report any bugs/issues here
Example 1: Normal distributed data
Compute mean and standard deviation estimates.
Please see manual for more examples.
# Apply MASS to create normal distributed data
# mean = 0 and standard deviation = 1
set.seed(99)
data <- data.frame(y =
MASS::mvrnorm(n=100,
mu=0,
Sigma=matrix(1, 1),
empirical=TRUE) )
# Run normal distribution analysis on data
## Use 50 000 iterations, set seed for replication.
### Null value assessment is set at ROPE = -0.5 - 0.5 for mean
mcmc <- bfw(project.data = data,
y = "y",
saved.steps = 50000,
jags.model = "mean",
jags.seed = 100,
ROPE = c(-0.5,0.5),
silent = TRUE)
# Run t-distribution analysis on data
mcmc.robust <- bfw(project.data = data,
y = "y",
saved.steps = 50000,
jags.model = "mean",
jags.seed = 101,
ROPE = c(-0.5,0.5),
run.robust = TRUE,
silent = TRUE)
# Use psych to describe the normally distributed data
psych::describe(data)[,c(2:12)]
#> n mean sd median trimmed mad min max range skew kurtosis
#> X1 100 0 1 0.17 0.04 0.81 -3.26 2.19 5.45 -0.57 0.53
# Print summary of normal distribution analysis
## Only the most relevant information is shown here
round(mcmc$summary.MCMC[,c(3:6,9:12)],3)
#> Mode ESS HDIlo HDIhi ROPElo ROPEhi ROPEin n
#> mu[1]: Y -0.003 49830 -0.201 0.196 0 0 100 100
#> sigma[1]: Y 0.995 48890 0.869 1.150 0 100 0 100
# Print summary of t-distribution analysis
round(mcmc.robust$summary.MCMC[,c(3:6,9:12)],3)
#> Mode ESS HDIlo HDIhi ROPElo ROPEhi ROPEin n
#> mu[1]: Y 0.027 25887 -0.167 0.229 0 0 100 100
#> sigma[1]: Y 0.933 10275 0.749 1.115 0 100 0 100
Example 2: Same data but with outliers
# Add 10 outliers, each with a value of 10.
biased.data <- rbind(data,data.frame(y = rep(10,10)))
# Run normal distribution analyis on biased data
biased.mcmc <- bfw(project.data = biased.data,
y = "y",
saved.steps = 50000,
jags.model = "mean",
jags.seed = 102,
ROPE = c(-0.5,0.5),
silent = TRUE)
# Run t-distribution analysis on biased data
biased.mcmc.robust <- bfw(project.data = biased.data,
y = "y",
saved.steps = 50000,
jags.model = "mean",
jags.seed = 103,
ROPE = c(-0.5,0.5),
run.robust =TRUE,
silent = TRUE)
# Use psych to describe the biased data
psych::describe(biased.data)[,c(2:12)]
#> n mean sd median trimmed mad min max range skew kurtosis
#> X1 110 0.91 3.04 0.25 0.21 0.92 -3.26 10 13.3 2.3 4.38
# Print summary of normal distribution analysis on biased data
## Only the most relevant information is shown here
round(biased.mcmc$summary.MCMC[,c(3:6,9:12)],3)
#> Mode ESS HDIlo HDIhi ROPElo ROPEhi ROPEin n
#> mu[1]: Y 0.922 50000 0.346 1.50 0 92 8.05 110
#> sigma[1]: Y 2.995 49069 2.662 3.48 0 100 0.00 110
# # Print summary of t-distribution analysis on biased data
round(biased.mcmc.robust$summary.MCMC[,c(3:6,9:12)],3)
#> Mode ESS HDIlo HDIhi ROPElo ROPEhi ROPEin n
#> mu[1]: Y 0.168 29405 -0.015 0.355 0 0.008 99.992 110
#> sigma[1]: Y 0.679 17597 0.512 0.903 0 99.128 0.872 110
Example 3: Custom function and model
Shamelessly adapted from here (credits to James Curran).
# Create a function for left-censored data
custom.function <- function(DF, ...) {
x <- as.vector(unlist(DF))
x[x < log(29)] = NA
n <- length(x)
LOD <- rep(log(29), n)
is.above.LOD <- ifelse(!is.na(x), 1, 0)
lambda <- 0.5
initial.x = rep(NA, length(x))
n.missing <- sum(!is.above.LOD)
initial.x[!is.above.LOD] <- runif(n.missing, 0, log(29))
initial.list <- list(lambda = lambda,
x = initial.x
)
data.list <- list(n = n,
x = x,
LOD = LOD,
is.above.LOD = is.above.LOD
)
# Return data list
return (
list (
params = "lambda",
initial.list = initial.list,
data.list = data.list,
n.data = as.matrix(n)
)
)
}
# Create a model
custom.model = "
model{
for(i in 1:n){
is.above.LOD[i] ~ dinterval(x[i], LOD[i])
x[i] ~ dexp(lambda)
}
lambda ~ dgamma(0.001, 0.001)
}
"
# Simulate some data
set.seed(35202)
project.data <- as.matrix(rexp(10000, rate = 1.05))
# Run analysis
custom.mcmc <- bfw(project.data = project.data,
custom.function = custom.function,
custom.model = custom.model,
saved.steps = 50000,
jags.seed = 100,
ROPE = c(1.01,1.05),
silent = TRUE)
# Print analysis
round(custom.mcmc$summary.MCMC[,c(3,5,6,9:12)],3)
#> Mode HDIlo HDIhi ROPElo ROPEhi ROPEin n
#> 1.03 1.00 1.06 7.64 14.33 78.03 10000.00
The cost of conducting robust estimates
# Running time for normal distribution analyis
biased.mcmc$run.time[2] - biased.mcmc$run.time[1]
#> Time difference of 7.92 secs
# Running time for t-distribution analysis
biased.mcmc.robust$run.time[2] - biased.mcmc.robust$run.time[1]
#> Time difference of 35.8 secs
License
This project is licensed under the MIT License - see LICENSE for details
Acknowledgments
- John Kruschke for his overall amazing work, and especially for his workshop at ICPSR 2016. It opened my eyes to Bayesian statistics.
- Martyn Plummer for his work on JAGS.
- David Gohel for his work on officer
Code of Conduct
Don’t be evil. Please read the Code of Conduct
References
- Gelman, A., & Rubin, D. B. (1992). Inference from Iterative Simulation Using Multiple Sequences. Statistical Science, 7(4), 457-472. https://doi.org/10.1214/ss/1177011136
- Kruschke, J. K. (2013). Posterior predictive checks can and should be Bayesian: Comment on Gelman and Shalizi, ‘Philosophy and the practice of Bayesian statistics’. British Journal of Mathematical and Statistical Psychology, 66(1), 4556. https://doi.org/10.1111/j.2044-8317.2012.02063.x
- Kruschke, J. K. (2015). Doing Bayesian data analysis: a tutorial with R, JAGS, and Stan. Academic Press
- Plummer, M. (2003). JAGS A program for analysis of Bayesian graphical models using Gibbs sampling (Version 4.3.0). https://mcmc-jags.sourceforge.io.