MyNixOS website logo
Description

Flexible Modeling of Count Data.

For Bayesian and classical inference and prediction with count-valued data, Simultaneous Transformation and Rounding (STAR) Models provide a flexible, interpretable, and easy-to-use approach. STAR models the observed count data using a rounded continuous data model and incorporates a transformation for greater flexibility. Implicitly, STAR formalizes the commonly-applied yet incoherent procedure of (i) transforming count-valued data and subsequently (ii) modeling the transformed data using Gaussian models. STAR is well-defined for count-valued data, which is reflected in predictive accuracy, and is designed to account for zero-inflation, bounded or censored data, and over- or underdispersion. Importantly, STAR is easy to combine with existing MCMC or point estimation methods for continuous data, which allows seamless adaptation of continuous data models (such as linear regressions, additive models, BART, random forests, and gradient boosting machines) for count-valued data. The package also includes several methods for modeling count time series data, namely via warped Dynamic Linear Models. For more details and background on these methodologies, see the works of Kowal and Canale (2020) <doi:10.1214/20-EJS1707>, Kowal and Wu (2022) <doi:10.1111/biom.13617>, King and Kowal (2022) <arXiv:2110.14790>, and Kowal and Wu (2023) <arXiv:2110.12316>.

countSTAR: Flexible Modeling for Count Data

Overview

Count-valued data are common in many fields. Frequently, count data are observed jointly with predictors, over time intervals, or across spatial locations. Furthermore, they often exhibit a variety of complex distributional features, including zero-inflation, skewness, over- and underdispersion, and in some cases may be bounded or censored. Flexible and interpretable models for count-valued processes are therefore highly useful in practice.

countSTAR implements a variety of methods for modeling such processes, based on the idea of Simultaneous Transformation and Rounding (STAR). Estimation, inference, and prediction for STAR are available for both Bayesian and frequentist models. The bulk of methods serve for static regression problems, but the package also supports time series analysis via the warped Dynamic Linear Model (DLM) framework.

Broadly, STAR defines an count-valued probability model by (1) specifying a (conditionally) Gaussian model for continuous latent data and (2) connecting the latent data to the observed data via a transformation and rounding operation.

Importantly, STAR models are highly flexible count-valued processes, and provide the capability to model (i) discrete data, (ii) zero-inflation, (iii) over- or under-dispersion, (iv) heaping, and (v) bounded or censored data. The modularity of the STAR framework allows for the ability to utilize a wide variety of different latent data models, which can range from simple forms like linear regression to more advanced machine learning methods such as random forests or gradient boosting machines.

countSTAR can be installed and loaded as follows:

#CRAN version
install.packages("countSTAR")

#Development version
remotes::install_github("bking124/countSTAR")

library("countSTAR")

Detailed information on the different options for STAR models and how they are implemented in countSTAR can be found in the vignette, accessible on the website or by running the command vignette("countSTAR"). A basic breakdown of the available modeling functions is shown below:

Analysis TypeMethod (function)Dependent Package
Static Classical RegressionLinear regression (lm_star())-
-Generalized boosted modeling (gbm_star())gbm
-Random Forests (randomForest_star())randomForest
Static Bayesian RegressionLinear regression (blm_star())-
-Additive modeling (bam_star())spikeSlabGAM
-Spline regression (spline_star())spikeSlabGAM
-Bayesian additive regression trees (bart_star())dbarts
Time Series ModelingWarped Dynamic Linear Models (warpDLM())KFAS

In addition to these ready to use functions, users can also implement STAR methods with custom latent regression models using the genEM_star() and genMCMC_star() functions.

Please submit any issues or feature requests to https://github.com/bking124/countSTAR/issues.

Metadata

Version

1.0.2

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows