Description

Hierarchical Sufficient Dimension Reduction.

Description

Provides semiparametric sufficient dimension reduction for central mean subspaces for heterogeneous data defined by combinations of binary factors (such as chronic conditions). Subspaces are estimated to be hierarchically nested to respect the structure of subpopulations with overlapping characteristics. This package is an implementation of the proposed methodology of Huling and Yu (2021) <doi:10.1111/biom.13546>.

README.md

cran.r-project.org

hierSDR

The hierSDR package implements the methodology of Huling and Yu (2021+) which is a semiparametric sufficient dimension reduction approach that handles heterogeneity in the relationship between covariates and outcome where the heterogeneity is based on all combinations of a set of binary factors. A motivating example is modeling health risks based on covariates where the relationship between covariates and outcome changes depending on what combination of chronic conditions a patient has. In this example the chronic conditions are the factors that drive heterogeneity. Each subpopulation is the set of patients with a different combination of chronic conditions.

Installation

You can install the released version of hierSDR from CRAN with:

install.packages("hierSDR")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("jaredhuling/hierSDR")

Example

This is a basic example which shows you how to solve a common problem:

library(hierSDR)

Simulate data where response mean is generated from a multiple index model where each combination of a set of binary factors Z has a potentially different set of parameters in the multiple index model. Here, Z drives heterogeneity in the relationship between covariates X and a response Y.

set.seed(123)
dat <- simulate_data(nobs = 500, nvars = 6,
                     x.type = "some_categorical",
                     sd.y = 1, model = 2)

x <- dat$x ## covariates
z <- dat$z ## factor indicators, these drive heterogeneity in the data
y <- dat$y ## response

## true coefficients that generate the subspaces
dat$beta 
#> $none
#>            [,1]
#> [1,] 1.00000000
#> [2,] 0.06586526
#> [3,] 0.17240683
#> [4,] 0.10866231
#> [5,] 0.84253756
#> [6,] 0.77444774
#> 
#> $a
#>            [,1]
#> [1,] 1.00000000
#> [2,] 0.06586526
#> [3,] 0.17240683
#> [4,] 0.10866231
#> [5,] 0.84253756
#> [6,] 0.77444774
#> 
#> $b
#>            [,1]
#> [1,] 1.00000000
#> [2,] 0.06586526
#> [3,] 0.17240683
#> [4,] 0.10866231
#> [5,] 0.84253756
#> [6,] 0.77444774
#> 
#> $ab
#>            [,1]       [,2]
#> [1,] 1.00000000  1.0000000
#> [2,] 0.06586526  0.2317807
#> [3,] 0.17240683 -0.2069714
#> [4,] 0.10866231  0.7276493
#> [5,] 0.84253756  0.8026606
#> [6,] 0.77444774 -0.5029732

## what combinations of z represent different subpopulations
dat$z.combinations 
#>   a b
#> 1 0 0
#> 2 1 0
#> 3 0 1
#> 4 1 1

## correct structural dimensions:
## (the dimensions of the subpopulation-specific
##  central mean subspaces)
dat$d.correct
#> [1] 1 0 0 1

Now estimate the parameters in the multiple index model where each combination of the factors in Z has its own set of parameters, but the parameters are constrained so that the central mean subspaces (which convey the regression relationship information) are hierarchical according to the structure of the subpopulations defined by Z.

Here we are specifying the correct dimensions for each of the parameter matrices for each subpopulation. However, the correct dimensions can be estimated with a validated information criterion (VIC).

hiermod <- hier.sphd(x, y, z, dat$z.combinations, d = dat$d.correct,
                     verbose = FALSE)


## validated inf criterion for choosing dimensions 
## (the smaller the better).
## multiple models should be fit with differing dimensions and the
## dimensions with the smallest VIC should be used
hiermod$vic
#> [1] 507.1956

## true and estimated parameters
cbind(Estimated = hiermod$beta[[4]], NA, Truth = dat$beta[[4]])
#>           [,1]      [,2] [,3]       [,4]       [,5]
#> [1,] 1.0000000 1.0000000   NA 1.00000000  1.0000000
#> [2,] 0.3552258 0.3562017   NA 0.06586526  0.2317807
#> [3,] 0.2889527 0.2616284   NA 0.17240683 -0.2069714
#> [4,] 0.1832808 0.2413655   NA 0.10866231  0.7276493
#> [5,] 0.8526007 0.8487122   NA 0.84253756  0.8026606
#> [6,] 0.8043703 0.6349214   NA 0.77444774 -0.5029732

## angles between estimated and true subspaces for each subpopulation:
## (smaller means the subspaces are better estimated)
mapply(function(x,y) angle(x,y), hiermod$beta, dat$beta)
#>      0,0      1,0      0,1      1,1 
#> 11.34241 11.34241 11.34241 11.56792

## projection difference norm between estimated and true subspaces for each population:
## (smaller means the subspaces are better estimated)
mapply(function(x,y) projnorm(x,y), hiermod$beta, dat$beta)
#>       0,0       1,0       0,1       1,1 
#> 0.2781362 0.2781362 0.2781362 0.3871345

References

Huling, Jared D., and Menggang Yu. 2021+. “Sufficient Dimension Reduction for Populations with Structured Heterogeneity.” Biometrics, 2021+.

r-hierSDR

hierSDR

Installation

Example

References

Version

License

Status

Source

Homepage

Platforms (76)

hierSDR

Installation

Example

References

Version

License

Status

Source

Homepage

Platforms76 (76)

Platforms (76)