MyNixOS website logo
Description

Prioritize Variables with Joint Variable Importance Plot in Observational Study Design.

In the observational study design stage, matching/weighting methods are conducted. However, when many background variables are present, the decision as to which variables to prioritize for matching/weighting is not trivial. Thus, the joint treatment-outcome variable importance plots are created to guide variable selection. The joint variable importance plots enhance variable comparisons via unadjusted bias curves derived under the omitted variable bias framework. The plots translate variable importance into recommended values for tuning parameters in existing methods. Post-matching and/or weighting plots can also be used to visualize and assess the quality of the observational study design. The method motivation and derivation is presented in "Using Joint Variable Importance Plots to Prioritize Variables in Assessing the Impact of Glyburide on Adverse Birth Outcomes" by Liao et al. (2023) <arXiv:2301.09754>. See the package paper by Liao and Pimentel (2023) <arxiv:2302.10367> for a beginner friendly user introduction.

Joint variable importance plot

CRAN_Status_Badge CRAN_Downloads_Badge

Joint variable importance plot (jointVIP) visualizes each variable’s outcome importance via Pearson’s correlation and treatment importance via cross-sample standardized mean differences. Bias curves enable comparisons to support prioritization.

Installation

You can install the jointVIP package on CRAN using:

# for version on CRAN
install.packages("jointVIP")

# for development version on github
devtools::install_github("ldliao/jointVIP")

BRFSS Example

To demonstrate, we use the 2015 Behavioral Risk Factor Surveillance System (BRFSS) example to answer the causal question: Does smoking increase the risk of chronic obstructive pulmonary disease (COPD)? The data and background is inspired by Clay Ford’s work from University of Virginia Library. First, the data is cleaned to only have numeric variables, i.e., all factored variables are transformed via one-hot-encoding. Treatment variable smoke only contains 0 (control) and 1 (treatment).

With the cleaned data, you can specify details in the function create_jointVIP() like so:

library(jointVIP)
## basic example code

treatment = 'smoke'
outcome = 'COPD'
covariates = names(df)[!names(df) %in% c(treatment, outcome)]

## select the pilot sample from random portion
## pilot data here are considered as 'external controls'
## can be a separate dataset; should be chosen with caution
set.seed(1234895)
pilot_prop = 0.2
pilot_sample_num = sample(which(df %>% pull(treatment) == 0),
                          length(which(df %>% pull(treatment) == 0)) *
                          pilot_prop)

## set up pilot and analysis data
## we want to make sure these two data are non-overlapping
pilot_df = df[pilot_sample_num, ]
analysis_df = df[-pilot_sample_num, ]

## minimal example
brfss_jointVIP = create_jointVIP(treatment = treatment,
                                 outcome = outcome,
                                 covariates = covariates,
                                 pilot_df = pilot_df,
                                 analysis_df = analysis_df)

Generic functions can be used for the jointVIP object to extract information as a glance with summary() and print().

summary(brfss_jointVIP)
#> Max absolute bias is 0.032
#> 3 variables are above the desired 0.01 absolute bias tolerance
#> 13 variables can be plotted
print(brfss_jointVIP)
#>                 bias
#> age_over65     0.032
#> average_drinks 0.031
#> age_25to34     0.012
plot(brfss_jointVIP)

In this example, age_over65 and average_drinks are two most important variables to adjust. At a bias tolerance of 0.01, 3 variables: age_over65, average_drinks, and age_25to34 are above the tolerance threshold. Moreover, age_over65 and average_drinks are of higher importance for adjustment than age_25to34. Although race_black and age_over65 have similar absolute standardized mean differences (0.322 and 0.333, respectively), age_over65 is more important to adjust for since its highly correlated with the outcome.

Metadata

Version

0.1.2

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows