MyNixOS website logo
Description

SEM Model Comparison with K-Fold Cross-Validation.

The goal of 'cvsem' is to provide functions that allow for comparing Structural Equation Models (SEM) using cross-validation. Users can specify multiple SEMs using 'lavaan' syntax. 'cvsem' computes the Kullback Leibler (KL) Divergence between 1) the model implied covariance matrix estimated from the training data and 2) the sample covariance matrix estimated from the test data described in Cudeck, Robert & Browne (1983) <doi:10.18637/jss.v048.i02>. The KL Divergence is computed for each of the specified SEMs allowing for the models to be compared based on their prediction errors.

cvsem

R-CMD-check

The cvsem package provides cross-validation (CV) of structural equation models (SEM) across a user-defined number of folds. CV is based on computing the discrepancy among the held-out test sample covariance and the model implied covariance from the training samples. This approach of cross-validating SEM’s is described in Cudeck and Browne (1983) and Browne and Cudeck (1992). The individual models are fitted via the lavaan package (Rosseel 2012) to obtain the model implied covariance matrix. The discrepancy of the implied matrix to the test sample covariance matrix is obtained via a pre-specified metric (defaults to Kullback-Leibler divergence aka. Maximum Likelihood discrepancy). The cvsem function returns the average discrepancy together with a corresponding standard error for each tested model.

Currently, the provided model code needs to follow one of lavaan’s allowed specifications.

Installation

You can install the development version of cvsem from GitHub with:

# install.packages("devtools")
devtools::install_github("AnnaWysocki/cvsem")

Example

Cross-validating the Holzingerswineford1939 dataset

Load package and read in data from the lavaan package:

library(cvsem)

example_data <- lavaan::HolzingerSwineford1939

Add column names

colnames(example_data) <- c("id", "sex", "ageyr", "agemo", "school", "grade",
                            "visualPerception", "cubes", "lozenges", "comprehension",
                            "sentenceCompletion", "wordMeaning", "speededAddition",
                            "speededCounting", "speededDiscrimination")

Define Models

Define some models to be compared with cvsem using lavaan notation:

model1 <- 'comprehension ~ sentenceCompletion + wordMeaning'

model2 <- 'comprehension ~ meaning

           ## Add some latent variables:

           meaning =~ wordMeaning + sentenceCompletion
           speed =~ speededAddition + speededDiscrimination + speededCounting
           speed ~~ meaning'

model3 <- 'comprehension ~ wordMeaning + speededAddition'

Model List

Gather models into a named list object with cvgather. These could also be fitted lavaan objects based on the same data.

models <- cvgather(model1, model2, model3)

Cross-Validate with K-folds

Define number of folds k and call cvsem function. Here we use k=10 folds. CV is based on the discrepancy between test sample covariance matrix and the model implied matrix from the training data. The discrepancy among sample and implied matrix is defined in discrepancyMetric. Currently three discrepancy metrics are available: KL-Divergence, Generalized Least Squares GLS, and Frobenius Distance FD. Here we use KL-Divergence.

fit <- cvsem( data = example_data, Models = models, k = 10, discrepancyMetric = "KL-Divergence")
#> [1] "Cross-Validating model: model1"
#> [1] "Cross-Validating model: model2"
#> [1] "Cross-Validating model: model3"

Show Results

Print fitted cvsem-object. Note, the model with the smallest (best) discrepancy is listed first. The metric reflects the average of the discrepancy metric across all folds (aka. expected cross-validation index (ECVI)) together with the associated standard error.

fit
#> Cross-Validation Results of 3 models 
#> based on  k =  10 folds. 
#> 
#>    Model E(KL-D)   SE
#> 1 model1    1.29 0.44
#> 3 model3    2.28 0.50
#> 2 model2    3.48 0.64

References

Browne, Michael W., and Robert Cudeck. 1992. “Alternative Ways of Assessing Model Fit.” Sociological Methods & Research 21: 230–58.

Cudeck, Robert, and Michael W. Browne. 1983. “Cross-Validation of Covariance Structures.” Multivariate Behavioral Research 18: 147–67. https://doi.org/10.1207/s15327906mbr1802_2.

Rosseel, Yves. 2012. “lavaan: An R Package for Structural Equation Modeling.” Journal of Statistical Software. https://doi.org/10.18637/jss.v048.i02.

Metadata

Version

1.0.0

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows