Description
Active Learning for Process Monitoring.
Description
Implements the methodology introduced in Capezza, Lepore, and Paynabar (2025) <doi:10.1080/00401706.2025.2561744> for process monitoring with limited labeling resources. The package provides functions to (i) simulate data streams with true latent states and multivariate Gaussian observations as done in the paper, (ii) fit partially hidden Markov models (pHMMs) using a constrained Baum-Welch algorithm with partial labels, and (iii) perform stream-based active learning that balances exploration and exploitation to decide whether to request labels in real time. The methodology is particularly suited for statistical process monitoring in industrial applications where labeling is costly.
README.md
ActiveLearning4SPM
The ActiveLearning4SPM package implements the methodology of Capezza, Lepore, and Paynabar (2025) for
stream-based active learning for process monitoring:
- Capezza, C., Lepore, A., & Paynabar, K. (2025). Stream-Based Active Learning for Process Monitoring. Technometrics. doi:10.1080/00401706.2025.2561744.
The package provides tools to:
- Simulate multivariate data streams with true hidden states.
- Fit partially hidden Markov models (pHMMs) with user-specified or automatically initialized parameters.
- Perform stream-based active learning, balancing exploration and exploitation when deciding whether to acquire new labels under a budget constraint.
The methodology is motivated by process monitoring in industrial applications where obtaining labels is costly.
Installation
You can install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("unina-sfere/ActiveLearning4SPM")
Once on CRAN, you’ll be able to install with:
install.packages("streamALpHMM")
Example
Simulate a data stream
library(ActiveLearning4SPM)
set.seed(123)
dat <- simulate_stream(T0 = 100, TT = 500, d = 10)
str(dat)
## List of 2
## $ x: num [1:600] 1 1 1 1 1 1 1 1 1 1 ...
## $ y: num [1:600, 1:10] 0.756 -0.872 -0.279 1.43 0.618 ...
Fit a pHMM with user-defined initialization
y <- dat$y
d <- ncol(y)
xlabeled <- dat$x
xlabeled[sample(1:600, 400)] <- NA # partially labeled
fit <- fit_pHMM(
y = y,
xlabeled = xlabeled,
nstates = 3,
mean_start = list(rep(0, d), rep(1, d), rep(-1, d)),
equal_covariance = TRUE
)
fit$AIC
## [1] 12711.52
Fit a pHMM with automatic initialization
fit_auto <- fit_pHMM_auto(y = y, xlabeled = xlabeled, max_nstates = 3)
fit_auto$AIC
## [1] 12711.52
Perform stream-based active learning
y <- dat$y[1:200, ]
true_x <- dat$x[1:200]
out <- active_learning_pHMM(y = y,
true_x = true_x,
T0 = 100,
B = 0.1,
verbose = TRUE)
## t=101 | Available labels: 10 | Explor. p-value = 0.212 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=102 | Available labels: 10 | Explor. p-value = 0.314 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=103 | Available labels: 10 | Explor. p-value = 0.696 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=104 | Available labels: 10 | Explor. p-value = 0.744 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=105 | Available labels: 10 | Explor. p-value = 0.254 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=106 | Available labels: 10 | Explor. p-value = 0.174 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=107 | Available labels: 10 | Explor. p-value = 0.064 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=108 | Available labels: 10 | Explor. p-value = 0.571 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=109 | Available labels: 10 | Explor. p-value = 0.046 | Exploit. p-value = 1.000 | True state = 1 | Decision = label_exploration
## t=110 | Available labels: 9 | Explor. p-value = 0.062 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=111 | Available labels: 9 | Explor. p-value = 0.221 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=112 | Available labels: 9 | Explor. p-value = 0.167 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=113 | Available labels: 9 | Explor. p-value = 0.872 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=114 | Available labels: 9 | Explor. p-value = 0.799 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=115 | Available labels: 9 | Explor. p-value = 0.489 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=116 | Available labels: 9 | Explor. p-value = 0.234 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=117 | Available labels: 9 | Explor. p-value = 0.567 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=118 | Available labels: 9 | Explor. p-value = 0.392 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=119 | Available labels: 9 | Explor. p-value = 0.648 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=120 | Available labels: 9 | Explor. p-value = 0.959 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=121 | Available labels: 9 | Explor. p-value = 0.902 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=122 | Available labels: 9 | Explor. p-value = 0.519 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=123 | Available labels: 9 | Explor. p-value = 0.524 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=124 | Available labels: 9 | Explor. p-value = 0.506 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=125 | Available labels: 9 | Explor. p-value = 0.695 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=126 | Available labels: 9 | Explor. p-value = 0.891 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=127 | Available labels: 9 | Explor. p-value = 0.928 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=128 | Available labels: 9 | Explor. p-value = 0.841 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=129 | Available labels: 9 | Explor. p-value = 0.736 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=130 | Available labels: 9 | Explor. p-value = 0.899 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=131 | Available labels: 9 | Explor. p-value = 0.828 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=132 | Available labels: 9 | Explor. p-value = 0.178 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=133 | Available labels: 9 | Explor. p-value = 0.086 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=134 | Available labels: 9 | Explor. p-value = 0.275 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=135 | Available labels: 9 | Explor. p-value = 0.483 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=136 | Available labels: 9 | Explor. p-value = 0.851 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=137 | Available labels: 9 | Explor. p-value = 0.879 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=138 | Available labels: 9 | Explor. p-value = 0.090 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=139 | Available labels: 9 | Explor. p-value = 0.008 | Exploit. p-value = 1.000 | True state = 1 | Decision = label_exploration
## t=140 | Available labels: 8 | Explor. p-value = 0.271 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=141 | Available labels: 8 | Explor. p-value = 0.578 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=142 | Available labels: 8 | Explor. p-value = 0.295 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=143 | Available labels: 8 | Explor. p-value = 0.400 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=144 | Available labels: 8 | Explor. p-value = 0.477 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=145 | Available labels: 8 | Explor. p-value = 0.491 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=146 | Available labels: 8 | Explor. p-value = 0.416 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=147 | Available labels: 8 | Explor. p-value = 0.783 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=148 | Available labels: 8 | Explor. p-value = 0.503 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=149 | Available labels: 8 | Explor. p-value = 0.160 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=150 | Available labels: 8 | Explor. p-value = 0.309 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=151 | Available labels: 8 | Explor. p-value = 0.163 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=152 | Available labels: 8 | Explor. p-value = 0.180 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=153 | Available labels: 8 | Explor. p-value = 0.243 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=154 | Available labels: 8 | Explor. p-value = 0.700 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=155 | Available labels: 8 | Explor. p-value = 0.960 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=156 | Available labels: 8 | Explor. p-value = 0.503 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=157 | Available labels: 8 | Explor. p-value = 0.518 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=158 | Available labels: 8 | Explor. p-value = 0.217 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=159 | Available labels: 8 | Explor. p-value = 0.934 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=160 | Available labels: 8 | Explor. p-value = 0.883 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=161 | Available labels: 8 | Explor. p-value = 0.791 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=162 | Available labels: 8 | Explor. p-value = 0.570 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=163 | Available labels: 8 | Explor. p-value = 0.237 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=164 | Available labels: 8 | Explor. p-value = 0.035 | Exploit. p-value = 1.000 | True state = 1 | Decision = label_exploration
## t=165 | Available labels: 7 | Explor. p-value = 0.240 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=166 | Available labels: 7 | Explor. p-value = 0.378 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=167 | Available labels: 7 | Explor. p-value = 0.374 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=168 | Available labels: 7 | Explor. p-value = 0.234 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=169 | Available labels: 7 | Explor. p-value = 0.251 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=170 | Available labels: 7 | Explor. p-value = 0.344 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=171 | Available labels: 7 | Explor. p-value = 0.789 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=172 | Available labels: 7 | Explor. p-value = 0.608 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=173 | Available labels: 7 | Explor. p-value = 0.482 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=174 | Available labels: 7 | Explor. p-value = 0.497 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=175 | Available labels: 7 | Explor. p-value = 0.706 | Exploit. p-value = 1.000 | True state = 1 | Decision = 1
## t=176 | Available labels: 7 | Explor. p-value = 0.093 | Exploit. p-value = 1.000 | True state = 2 | Decision = label_exploration
## t=177 | Available labels: 6 | Explor. p-value = 0.008 | Exploit. p-value = 0.500 | True state = 2 | Decision = label_exploration
## t=178 | Available labels: 5 | Explor. p-value = 0.353 | Exploit. p-value = 0.300 | True state = 2 | Decision = 2
## t=179 | Available labels: 5 | Explor. p-value = 0.722 | Exploit. p-value = 0.200 | True state = 2 | Decision = 2
## t=180 | Available labels: 5 | Explor. p-value = 0.949 | Exploit. p-value = 0.650 | True state = 2 | Decision = 2
## t=181 | Available labels: 5 | Explor. p-value = 0.947 | Exploit. p-value = 0.250 | True state = 1 | Decision = 2
## t=182 | Available labels: 5 | Explor. p-value = 0.029 | Exploit. p-value = 0.650 | True state = 1 | Decision = label_exploration
## t=183 | Available labels: 4 | Explor. p-value = 0.018 | Exploit. p-value = 0.100 | True state = 1 | Decision = label_exploitation
## t=184 | Available labels: 3 | Explor. p-value = 0.105 | Exploit. p-value = 0.455 | True state = 1 | Decision = 1
## t=185 | Available labels: 3 | Explor. p-value = 0.060 | Exploit. p-value = 0.381 | True state = 1 | Decision = label_exploration
## t=186 | Available labels: 2 | Explor. p-value = 0.401 | Exploit. p-value = 0.033 | True state = 1 | Decision = label_exploitation
## t=187 | Available labels: 1 | Explor. p-value = 0.861 | Exploit. p-value = 0.804 | True state = 1 | Decision = 1
## t=188 | Available labels: 1 | Explor. p-value = 0.618 | Exploit. p-value = 0.058 | True state = 1 | Decision = 1
## t=189 | Available labels: 1 | Explor. p-value = 0.513 | Exploit. p-value = 0.146 | True state = 1 | Decision = 1
## t=190 | Available labels: 1 | Explor. p-value = 0.154 | Exploit. p-value = 0.182 | True state = 1 | Decision = 1
## t=191 | Available labels: 1 | Explor. p-value = 0.602 | Exploit. p-value = 0.525 | True state = 1 | Decision = 1
## t=192 | Available labels: 1 | Explor. p-value = 0.441 | Exploit. p-value = 0.056 | True state = 1 | Decision = 1
## t=193 | Available labels: 1 | Explor. p-value = 0.582 | Exploit. p-value = 0.469 | True state = 1 | Decision = 1
## t=194 | Available labels: 1 | Explor. p-value = 0.502 | Exploit. p-value = 0.857 | True state = 1 | Decision = 1
## t=195 | Available labels: 1 | Explor. p-value = 0.803 | Exploit. p-value = 0.792 | True state = 1 | Decision = 1
## t=196 | Available labels: 1 | Explor. p-value = 0.371 | Exploit. p-value = 0.200 | True state = 1 | Decision = 1
## t=197 | Available labels: 1 | Explor. p-value = 0.703 | Exploit. p-value = 0.700 | True state = 1 | Decision = 1
## t=198 | Available labels: 1 | Explor. p-value = 0.676 | Exploit. p-value = 0.500 | True state = 1 | Decision = 1
## t=199 | Available labels: 1 | Explor. p-value = 0.611 | Exploit. p-value = 0.400 | True state = 1 | Decision = 1
## t=200 | Available labels: 1 | Explor. p-value = 0.182 | Exploit. p-value = 0.150 | True state = 1 | Decision = label_exploitation
table(out$xhat, true_x)
## true_x
## 1 2
## 1 194 0
## 2 1 5
out$scores
## $accuracy
## [1] 0.99
##
## $precision
## [1] 0.8333333
##
## $recall
## [1] 1
##
## $f1
## [1] 0.9090909
##
## $auc
## [1] 0.9947368