MyNixOS website logo
Description

Cluster Adjusted t Statistic Applications.

Simulation results detailed in Esarey and Menger (2019) <doi:10.1017/psrm.2017.42> demonstrate that cluster adjusted t statistics (CATs) are an effective method for correcting standard errors in scenarios with a small number of clusters. The 'mmiCATs' package offers a suite of tools for working with CATs. The mmiCATs() function initiates a 'shiny' web application, facilitating the analysis of data utilizing CATs, as implemented in the cluster.im.glm() function from the 'clusterSEs' package. Additionally, the pwr_func_lmer() function is designed to simplify the process of conducting simulations to compare mixed effects models with CATs models. For educational purposes, the CloseCATs() function launches a 'shiny' application card game, aimed at enhancing users' understanding of the conditions under which CATs should be preferred over random intercept models.

mmiCATs

The goal of the Mighty Metrika Interface to Cluster Adjusted t-Statistics (‘mmiCATs’) R package is to provide ‘shiny’ web applications for CATs and to provide research tools for investigating when CATs models should be preferred over other statistical models used for cluster adjustment.

The implementations of CATs in ‘mmiCATs’ is based on the cluster.im.glm() function from the R package ‘clusterSEs’. For more information on CATs see Esarey and Menger (2019).

Installation

You can install the released version of ‘mmiCATs’ from CRAN:

install.packages("mmiCATs")

You can install the development version of ‘mmiCATs’ like so:

# install.packages("devtools")
devtools::install_github("mightymetrika/mmiCATs")

Shiny Application

Load the ‘mmiCATs’ package and call the mmiCATs() function to launch a ‘shiny’ web application which allows you to run the clusterSEs::cluster.im.glm() function on a csv dataset:

library(mmiCATs)
mmiCATs()

CATs with Robust Models (for simulation research purposes only)

R packages such as ‘robust’ and ‘robustbase’ have functions which allow users to run robust alternatives to stats::lm() and stats::glm(). The ‘mmiCATs’ package has the functions cluster_im_lmRob() and cluster_im_glmRob() which take the basic algorithm used in clusterSEs::cluster.im.glm() but swaps out stats::glm() for a robust alternative. In the case of cluster_im_lmRob() either robust::lmRob() or robustbase::lmrob() get swapped in while in the case of cluster_im_glmRob() either robust::glmRob() or robustbase::glmrob() get swapped in. The example below shows some simple results comparing the methods for linear models.

# Get common parameters
.form <- Sepal.Length ~ Petal.Length + Petal.Width
.clust <- ~ Species

# clusterSEs::cluster.im.glm()
glmout <- stats::glm(.form,
                     family = "gaussian",
                     data = iris)

clusterSEs::cluster.im.glm(glmout, dat = iris, cluster = .clust,
                           return.vcv = TRUE)
#> 
#>  Cluster-Adjusted p-values:  
#>  
#>            variable name   cluster-adjusted p-value
#>              (Intercept)                       0.11
#>             Petal.Length                      0.055
#>              Petal.Width                      0.705
#> 
#>  Confidence Intervals (centered on cluster-averaged results): 
#>  
#>       variable name              CI lower             CI higher
#>         (Intercept)     -1.42830072928431      6.54809830391216
#>        Petal.Length   -0.0384794641752815      1.59034415889221
#>         Petal.Width     -1.17723433639784      1.44337893950581

# robust::lmRob()
robustout <- robust::lmRob(.form, data = iris)

cluster_im_lmRob(robustout, .form, dat = iris, cluster = .clust,
                 engine = "robust", return.vcv = TRUE)
#> 00:00:00 left
#> 00:00:00 left
#> 00:00:00 left
#> $p.values
#>                    [,1]
#> (Intercept)  0.10328518
#> Petal.Length 0.02382271
#> Petal.Width  0.71362962
#> 
#> $ci
#>                CI lower CI higher
#> (Intercept)  -1.2299791  6.133096
#> Petal.Length  0.2716887  1.406647
#> Petal.Width  -1.1638474  1.417432
#> 
#> $vcv.hat
#>              (Intercept) Petal.Length Petal.Width
#> (Intercept)    2.1963785  -0.32092673   0.5416086
#> Petal.Length  -0.3209267   0.05218536  -0.1060045
#> Petal.Width    0.5416086  -0.10600446   0.2699346
#> 
#> $beta.bar
#>  (Intercept) Petal.Length  Petal.Width 
#>    2.4515587    0.8391680    0.1267921

# robustbase::lmrob()
robustbaseout <- robustbase::lmrob(.form, data = iris)

cluster_im_lmRob(robustbaseout, .form, dat = iris, cluster = .clust,
                 engine = "robustbase", return.vcv = TRUE)
#> $p.values
#>                    [,1]
#> (Intercept)  0.10305707
#> Petal.Length 0.03924708
#> Petal.Width  0.72343658
#> 
#> $ci
#>                 CI lower CI higher
#> (Intercept)  -1.26209472  6.312871
#> Petal.Length  0.09756128  1.507929
#> Petal.Width  -1.10924300  1.341017
#> 
#> $vcv.hat
#>              (Intercept) Petal.Length Petal.Width
#> (Intercept)    2.3246098  -0.41242678   0.5770706
#> Petal.Length  -0.4124268   0.08058483  -0.1296058
#> Petal.Width    0.5770706  -0.12960576   0.2432276
#> 
#> $beta.bar
#>  (Intercept) Petal.Length  Petal.Width 
#>    2.5253884    0.8027451    0.1158868

The simulation study in Esarey and Menger (2019) tested a few different methods for handling clustering. They found that a correctly specified mixed effects model tends to perform most efficiently; however, they found that CATs can outperform a mispecified mixed effects model. The pwr_func_lmer() function can be used to run a simulation where data is generated from a mixed effect model and results are compared between:

  • The data generating mixed effects model
  • A random intercept model
  • A clusterSEs::cluster.im.glm(drop = TRUE, truncate = FALSE) model
  • A clusterSEs::cluster.im.glm(drop = TRUE, truncate = TRUE) model
  • A cluster_im_lmRob(drop = TRUE, engine = “robust”) model
  • A cluster_im_lmRob(drop = TRUE, engine = “robustbase”) model

The models are compared on:

  • Mean coefficient
  • Rejection rate
  • Rejection rate standard error
  • Root mean square error
  • Relative root mean square error
  • Confidence interval coverage
  • Average confidence interval width

The following example shows a simulation where both a random intercept and random slope are specified and where two of the variables (x1 and x3) are correlated. The simulation is limited to 5 reps to minimize computation time. The main variable of interest is variable x1; as such, the comparison metrics will be with respect to x1.

pwr_func_lmer(betas = list("int" = 0, "x1" = -5, "x2" = 2, "x3" = 10),
              dists = list("x1" = stats::rnorm,
                           "x2" = stats::rbinom,
                           "x3" = stats::rnorm),
              distpar = list("x1" = list(mean = 0, sd = 1),
                             "x2" = list(size = 1, prob = 0.4),
                             "x3" = list(mean = 1, sd = 2)),
              N = 50,
              reps = 5,
              alpha = 0.05,
              var_intr = "x1",
              grp = "ID",
              mod = "out ~ x1 + x2 + x3 + (x3|ID)",
              catsmod = "out ~ x1 + x2 + x3",
              r_slope = "x3",
              r_int = "int",
              n_time = 100,
              mean_i = 0,
              var_i = 1,
              mean_s = 0,
              var_s = 1,
              cov_is = 0,
              mean_r = 0,
              var_r = 1,
              cor_mat = diag(2),
              corvars = list(c("x1", "x3")))
#>             model mean_coef rejection_rate rejection_rate_se        rmse
#> 1             lme -5.007189            100                 0 0.010810328
#> 2              ri -5.000001            100                 0 0.013853265
#> 3            cats -5.006698            100                 0 0.009592995
#> 4      cats_trunc -5.006698            100                 0 0.009592995
#> 5     cats_robust -5.008836            100                 0 0.013731247
#> 6 cats_robustbase -5.008159            100                 0 0.011653423
#>         rrmse coverage avg_ci_width success
#> 1 0.002162066      100   0.05650710       5
#> 2 0.002770653      100   0.08214456       5
#> 3 0.001918599      100   0.06106792       5
#> 4 0.001918599      100   0.06106792       5
#> 5 0.002746249      100   0.06670095       5
#> 6 0.002330685      100   0.06385641       5

CloseCATs

When summarizing simulation results, Esarey and Menger (2019) state, “in our simulations an accurate RE model of intra-cluster heterogeneity provides better performance than any cluster adjustment technique, but the cluster adjustment techniques perform better in the event of misspecification.” Of the cluster adjustment techniques, the summary also mentions that, “Our simulation analysis finds that CATs (based on the work of Ibragimov and Muller (2010)) are the best choice among the options we examine for correcting SEs for clustering in data sets with a small number of clusters.”

In practice, mixed effects models are often used to obtain cluster adjusted results. However, when the sample size is small, researchers often use a random intercept model (with no random slope) in order to obtain a model which is not too complex for the data. But if the true data generating process is consistent with a mixed effects model with a random slope then the random intercept model might be a misspecification.

CloseCATs is a card game where:

  • Two cards are dealt to the computer and two to the player
  • The card in row 1 represents the random slope variance
  • The card in row 2 represents the random effect covariance
  • The player has an option to switch their row 1 and row 2 card (the computer does not)
  • Data is simulated from a mixed effects model defined by the game setup and the cards
  • Simulated data is fitted to the data generating mixed effects model, a random intercept model, and CATs models
  • A misspecification distance (rmse of CATs - rmse of random intercept) is computed for the computer and for the player
  • The lower misspecification distance wins the game

The game was designed to help users get a better understanding of when to prefer a CATs model over a random intercept model. To play the game, call the CloseCATs() function to launch the ‘shiny’ application.

CloseCATs()

References

Esarey J, Menger A. Practical and Effective Approaches to Dealing With Clustered Data. Political Science Research and Methods. 2019;7(3):541-559. doi:10.1017/psrm.2017.42

Rustam Ibragimov & Ulrich K. Muller (2010) t-Statistic Based Correlation and Heterogeneity Robust Inference, Journal of Business & Economic Statistics, 28:4, 453-468, DOI: 10.1198/jbes.2009.08046

Metadata

Version

0.1.1

License

Unknown

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows