MyNixOS website logo
Description

Estimation of a Variety of Count Regression Models.

An implementation of multiple regression models for count data. These include various forms of the negative binomial (NB-1, NB-2, NB-P, generalized negative binomial, etc.), Poisson-Lognormal, other compound Poisson distributions, the Generalized Waring model, etc. Information on the different forms of the negative binomial are described by Greene (2008) <doi:10.1016/j.econlet.2007.10.015>. For treatises on count models, see Cameron and Trivedi (2013) <doi:10.1017/CBO9781139013567> and Hilbe (2012) <doi:10.1017/CBO9780511973420>. For the implementation of under-reporting in count models, see Wood et al. (2016) <doi:10.1016/j.aap.2016.06.013>. For prediction methods in random parameter models, see Wood and Gayah (2025) <doi:10.1016/j.aap.2025.108147>. For estimating random parameters using maximum simulated likelihood, see Greene and Hill (2010) <doi:10.1108/S0731-9053(2010)26>; Gourieroux and Monfort (1996) <doi:10.1093/0198774753.001.0001>; or Hensher et al. (2015) <doi:10.1017/CBO9781316136232>.

flexCountReg

codecov

The goal of flexCountReg is to provide functions that allow the analyst to estimate count regression models that can handle multiple analysis issues including excess zeros, overdispersion as a function of variables (i.e., generalized count models), random parameters, etc.

Installation

You can install the development version of flexCountReg like using:

# install.packages("devtools")
devtools::install_github("jwood-iastate/flexCountReg")

Functions and Data

The following functions are included in the flexCountReg package, grouped by continuous and count distributions.

Distribution Functions

Continuous Distributions

  • Inverse Gamma Distribution

    • dinvgamma for the density function
    • pinvgamma for the cumulative density function
    • qinvgamma for the quantile function
    • rinvgamma for random number generation
  • Triangle Distribution

    • dtri for the density function
    • ptri for the cumulative density function
    • qtri for the quantile function
    • rtri for random number generation
  • Lognormal Distribution

    • mgf_lognormal for estimating the moment generating function

Count Distributions

  • Generalized Waring Distribution

    • dgwar for the density function
    • pgwar for the cumulative density function
    • qgwar for the quantile function
    • rgwar for random number generation
  • Poisson-Generalized-Exponential Distribution

    • dpge for the density function
    • ppge for the cumulative density function
    • qpge for the quantile function
    • rpge for random number generation
  • Poisson-Inverse-Gaussian Distribution (Types 1 and 2)

    • dpinvgaus for the density function
    • ppinvgaus for the cumulative density function
    • qpinvgaus for the quantile function
    • rpinvgaus for random number generation
  • Poisson-Inverse-Gamma Distribution

    • dpinvgamma for the density function
    • ppinvgamma for the cumulative density function
    • qpinvgamma for the quantile function
    • rpinvgamma for random number generation
  • Poisson-Lindley Distribution

    • dplind for the density function
    • pplind for the cumulative density function
    • qplind for the quantile function
    • rplind for random number generation
  • Poisson-Lindley-Gamma (Negative Binomial-Lindley) Distribution

    • dplindGamma for the density function
    • pplindGamma for the cumulative density function
    • qplindGamma for the quantile function
    • rplindGamma for random number generation
  • Poisson-Lindley-Lognormal Distribution

    • dplindLnorm for the density function
    • pplindLnorm for the cumulative density function
    • qplindLnorm for the quantile function
    • rplindLnorm for random number generation
  • Poisson-Lognormal Distribution

    • dpLnorm for the density function
    • ppLnorm for the cumulative density function
    • qtpLnorm for the quantile function
    • rpLnorm for random number generation
  • Poisson-Weibull Distribution

    • dpoisweibull for the density function
    • ppoisweibull for the cumulative density function
    • qpoisweibull for the quantile function
    • rpoisweibull for random number generation
  • Sichel Distribution

    • dsichel for the density function
    • psichel for the cumulative density function
    • qsichel for the quantile function
    • rsichel for random number generation
  • Conway-Maxwell-Poisson Distribution

    • dcom for the density function
    • pcom for the cumulative density function
    • qcom for the quantile function
    • rcom for random number generation

Model Estimation Functions

  • countreg is a general function for estimating the non-panel, non-random parameters count regression models
  • countreg.rp estimates the random parameters count models.
  • poislind.re estimates the random effects Poisson-Lindley model
  • renb estimates the random effects negative binomial regression model.

Model Evaluation, Comparison, and Convenience Functions

  • cureplot generates a CURE plot for the specified model, based on the cureplots package.
  • mae computes the Mean Absolute Error (MAE).
  • myAIC computes the Akaike Information Criterion (AIC) value.
  • myBIC computes the Bayesian Information Criterion (BIC) value.
  • regCompTable creates a publication-ready table comparing multiple models. This can include the regression estimate results, AIC, BIC, and Pseudo R-Square values.
  • regCompTest compares any given model with a base model. This can be used to perform a likelihood ratio test between models.
  • rmse computes the Root Mean Squared Error (RMSE).
  • predict allows the predict function to be used for out-of-sample predictions for any of the flexCountReg models.
  • summary allows the use of the summary function to get a model summary from a flexCountReg regression object.

Data A dataset, washington_roads, is included. It is based on a sample of Washington primary 2-lane roads from the years 2016-2018. Data for the roads, traffic volumes (AADT) and associated crashes were obtained from the Highway Safety Information System (HSIS).

Probability Distributions

As noted in the list of functions, the probability distributions below are included in the flexCountReg package. Details of the distributions are provided in the documentation (help files).

Continuous Distributions

  • Inverse Gamma Distribution
  • Triangle Distribution

Count DistributionsDistributions that Handle Equidispersion

  • Poisson

Distributions that handle Underdispersion

  • Conway-Maxwell-Poisson (COM) Distribution

Distributions that Handle Overdispersion

  • Negative Binomial in various forms (NB-1, NB-2, and NB-P)
  • Poisson-Inverse-Gaussian Distribution (Types 1 and 2)
  • Poisson-Inverse-Gamma
  • Poisson-Lognormal Distribution
  • Poisson-Weibull Distribution
  • Sichel Distribution
  • Generalized Waring Distribution

Distributions that Handle Excess Zeros

  • Poisson-Generalized-Exponential Distribution
  • Poisson-Lindley Distribution
  • Poisson-Lindley-Gamma (Negative Binomial-Lindley) Distribution
  • Poisson-Lindley-Lognormal Distribution

Example

The following is an example of using flexCountReg to estimate a negative binomial (NB-2) regression model with the overdispersion parameter as a function of predictor variables:

library(gt) # used to format summary tables here
library(flexCountReg)
library(knitr)

data("washington_roads")
washington_roads$AADT10kplus <- ifelse(washington_roads$AADT > 10000, 1, 0)
gen.nb2 <- countreg(Total_crashes ~ lnaadt + lnlength + speed50 + AADT10kplus,
               data = washington_roads, family = "NB2",
               dis_param_formula_1 = ~ speed50,  method='BFGS')
kable(summary(gen.nb2), caption = "NB-2 Model Summary")
#> Call:
#>  Total_crashes ~ lnaadt + lnlength + speed50 + AADT10kplus 
#> 
#>  Method:  countreg 
#> Iterations:  44 
#> Convergence:  successful convergence  
#> Log-likelihood:  -1064.876 
#> 
#> Parameter Estimates:
#> (Using bootstrapped standard errors)
#> # A tibble: 7 × 7
#>   parameter           coeff `Std. Err.` `t-stat` `p-value` `lower CI` `upper CI`
#>   <chr>               <dbl>       <dbl>    <dbl>     <dbl>      <dbl>      <dbl>
#> 1 (Intercept)        -7.40        0.043  -172.       0         -7.49      -7.32 
#> 2 lnaadt              0.912       0.005   182.       0          0.902      0.921
#> 3 lnlength            0.843       0.037    22.9      0          0.771      0.915
#> 4 speed50            -0.47        0.102    -4.62     0         -0.669     -0.27 
#> 5 AADT10kplus         0.77        0.089     8.61     0          0.594      0.945
#> 6 ln(alpha):(Interc… -1.62        0.291    -5.57     0         -2.19      -1.05 
#> 7 ln(alpha):speed50   1.31        0.458     2.85     0.004      0.409      2.20
parametercoeffStd. Err.t-statp-valuelower CIupper CI
(Intercept)-7.4010.043-171.5620.000-7.486-7.317
lnaadt0.9120.005182.4530.0000.9020.921
lnlength0.8430.03722.8780.0000.7710.915
speed50-0.4700.102-4.6190.000-0.669-0.270
AADT10kplus0.7700.0898.6070.0000.5940.945
ln(alpha):(Intercept)-1.6190.291-5.5680.000-2.189-1.049
ln(alpha):speed501.3060.4582.8540.0040.4092.203

NB-2 Model Summary

teststats <- regCompTest(gen.nb2)
kable(teststats$statistics)
StatisticModelBaseModel
AIC2143.75223049.659
BIC2180.94943054.973
LR Test Statistic917.9070NA
LR degrees of freedom6.0000NA
LR p-value0.0000NA
McFadden’s Pseudo R^20.3012NA

Checking the CURE plot:

cureplot(gen.nb2, indvar  ="lnaadt")
#> Covariate: indvar_values
#> CURE data frame was provided. Its first column, lnaadt, will be used.

Modifying the model to fit better:



gen.nb2 <- countreg(Total_crashes ~ lnaadt  + lnlength + speed50 +
                                ShouldWidth04 + AADT10kplus + 
                                I(AADT10kplus/lnaadt),
                                data = washington_roads, family = "NB2",
                                dis_param_formula_1 = ~ lnlength,  method='BFGS')

kable(summary(gen.nb2), caption = "Modified NB-2 Model Summary")
#> Call:
#>  Total_crashes ~ lnaadt + lnlength + speed50 + ShouldWidth04 +      AADT10kplus + I(AADT10kplus/lnaadt) 
#> 
#>  Method:  countreg 
#> Iterations:  56 
#> Convergence:  successful convergence  
#> Log-likelihood:  -1061.914 
#> 
#> Parameter Estimates:
#> (Using bootstrapped standard errors)
#> # A tibble: 9 × 7
#>   parameter           coeff `Std. Err.` `t-stat` `p-value` `lower CI` `upper CI`
#>   <chr>               <dbl>       <dbl>    <dbl>     <dbl>      <dbl>      <dbl>
#> 1 (Intercept)        -7.68        0.043  -180.       0         -7.76      -7.59 
#> 2 lnaadt              0.93        0.005   188.       0          0.92       0.939
#> 3 lnlength            0.853       0.038    22.6      0          0.779      0.927
#> 4 speed50            -0.4         0.091    -4.38     0         -0.579     -0.221
#> 5 ShouldWidth04       0.261       0.06      4.36     0          0.143      0.378
#> 6 AADT10kplus         5.96        0.092    64.6      0          5.78       6.14 
#> 7 I(AADT10kplus/ln… -50.1         0.938   -53.5      0        -52.0      -48.3  
#> 8 ln(alpha):(Inter…  -1.91        0.324    -5.91     0         -2.55      -1.28 
#> 9 ln(alpha):lnleng…  -0.43        0.244    -1.76     0.078     -0.908      0.048
parametercoeffStd. Err.t-statp-valuelower CIupper CI
(Intercept)-7.6760.043-179.7520.000-7.759-7.592
lnaadt0.9300.005187.6330.0000.9200.939
lnlength0.8530.03822.5850.0000.7790.927
speed50-0.4000.091-4.3820.000-0.579-0.221
ShouldWidth040.2610.0604.3550.0000.1430.378
AADT10kplus5.9610.09264.6280.0005.7806.142
I(AADT10kplus/lnaadt)-50.1330.938-53.4540.000-51.971-48.295
ln(alpha):(Intercept)-1.9130.324-5.9100.000-2.547-1.278
ln(alpha):lnlength-0.4300.244-1.7640.078-0.9080.048

Modified NB-2 Model Summary

teststats <- regCompTest(gen.nb2)
kable(teststats$statistics)
StatisticModelBaseModel
AIC2141.82783049.659
BIC2189.65283054.973
LR Test Statistic923.8314NA
LR degrees of freedom8.0000NA
LR p-value0.0000NA
McFadden’s Pseudo R^20.3031NA
cureplot(gen.nb2, indvar  ="lnaadt")
#> Covariate: indvar_values
#> CURE data frame was provided. Its first column, lnaadt, will be used.

Estimating another model (NB-P) - without the interaction:

gen.nbp <- countreg(Total_crashes ~ lnaadt  + lnlength + speed50 +
                                ShouldWidth04 + AADT10kplus,
                                data = washington_roads, family = "NBp",
                                dis_param_formula_1 = ~ lnlength,  method='BFGS')
kable(summary(gen.nbp), caption = "NB-P Model Summary")
#> Call:
#>  Total_crashes ~ lnaadt + lnlength + speed50 + ShouldWidth04 +      AADT10kplus 
#> 
#>  Method:  countreg 
#> Iterations:  53 
#> Convergence:  successful convergence  
#> Log-likelihood:  -1062.195 
#> 
#> Parameter Estimates:
#> (Using bootstrapped standard errors)
#> # A tibble: 9 × 7
#>   parameter           coeff `Std. Err.` `t-stat` `p-value` `lower CI` `upper CI`
#>   <chr>               <dbl>       <dbl>    <dbl>     <dbl>      <dbl>      <dbl>
#> 1 (Intercept)        -7.76        0.043 -181.        0         -7.85      -7.68 
#> 2 lnaadt              0.938       0.005  189.        0          0.928      0.948
#> 3 lnlength            0.836       0.037   22.3       0          0.763      0.91 
#> 4 speed50            -0.384       0.093   -4.13      0         -0.567     -0.202
#> 5 ShouldWidth04       0.258       0.059    4.34      0          0.141      0.374
#> 6 AADT10kplus         0.689       0.088    7.87      0          0.518      0.861
#> 7 ln(alpha):(Interc… -1.50        0.294   -5.09      0         -2.07      -0.92 
#> 8 ln(alpha):lnlength -0.167       0.245   -0.682     0.495     -0.648      0.314
#> 9 ln(p)               0.525       0.173    3.03      0.002      0.186      0.864
parametercoeffStd. Err.t-statp-valuelower CIupper CI
(Intercept)-7.7640.043-181.2110.000-7.848-7.680
lnaadt0.9380.005189.4590.0000.9280.948
lnlength0.8360.03722.3140.0000.7630.910
speed50-0.3840.093-4.1300.000-0.567-0.202
ShouldWidth040.2580.0594.3350.0000.1410.374
AADT10kplus0.6890.0887.8670.0000.5180.861
ln(alpha):(Intercept)-1.4960.294-5.0940.000-2.072-0.920
ln(alpha):lnlength-0.1670.245-0.6820.495-0.6480.314
ln(p)0.5250.1733.0330.0020.1860.864

NB-P Model Summary

teststats <- regCompTest(gen.nbp)
kable(teststats$statistics)
StatisticModelBaseModel
AIC2142.38953049.659
BIC2190.21443054.973
LR Test Statistic923.2697NA
LR degrees of freedom8.0000NA
LR p-value0.0000NA
McFadden’s Pseudo R^20.3029NA

Checking the CURE plot (notice that the CURE plot is MUCH better in this case than the NB-2 without the interaction and still better than the modified NB-2):

cureplot(gen.nbp, indvar  ="lnaadt")
#> Covariate: indvar_values
#> CURE data frame was provided. Its first column, lnaadt, will be used.

Creating a table to compare the models:

regCompTable(list("Generalized NB-2"=gen.nb2, "Generalized NB-P"=gen.nbp), tableType="tibble") |> 
  kable()
ParameterGeneralized NB-2Generalized NB-P
(Intercept)-7.676 (0.043)***-7.764 (0.043)***
lnaadt0.93 (0.005)***0.938 (0.005)***
lnlength0.853 (0.038)***0.836 (0.037)***
speed50-0.4 (0.091)***-0.384 (0.093)***
ShouldWidth040.261 (0.06)***0.258 (0.059)***
AADT10kplus5.961 (0.092)***0.689 (0.088)***
I(AADT10kplus/lnaadt)-50.133 (0.938)***
ln(alpha):(Intercept)-1.913 (0.324)***-1.496 (0.294)***
ln(alpha):lnlength-0.43 (0.244)-0.167 (0.245)
ln(p)0.525 (0.173)**
N Obs.15011501
LL-1061.914-1062.195
AIC2141.8282142.389
BIC2189.6532190.214
Pseudo-R-Sq.0.3030.303

Note that the metrics for comparison are similar. While the models both have the same number of parameters, the NB-P was able to get better performance without requiring the interaction terms (which leads to strange relationships between the exposure metric and the outcome).

Metadata

Version

0.1.1

License

Unknown

Platforms (78)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows