Simulation and Analysis of Event History Data.
The simevent package
- simevent
- Installation
- Usage
- Description
- Example 1: General Event History Simulation with
simEventData - Example 2: Survival Data with
simSurvData - Example 3: Competing Risk Data with
simCRdata - Example 4: Health Care Data with
simDisease - Example 5: Health Care Data with
simTreatment - Example 6: Time Varying Effects with
simEventTV - Example 7: Simulating data from data
- Example 8: Interventions
simevent
The simevent package provides tools for simulating and analyzing complex continuous-time health care data.The simulated data includes variables that can be interpreted as treatment decisions, disease progression, and health factors.
Installation
You can install the development version of simevent from GitHub with:
# install.packages("pak")
pak::pak("miclukacova/simevent")
Usage
The package is loaded with the command
library(simevent)
Description
One of the core functions is the flexible function simEventData, which simulates from a Cox proportional hazards model with Weibull hazards. Users can specify parameters and covariate effects to create custom scenarios. In addition, the package offers several wrapper functions for common settings (e.g., survival data, competing risks) built on top of simEventData. The function simEventTV is an adaptation of the function simEventData. It allows for time varying effects. This feature results in a slightly slower simulation procedure, so when possible simEventData is recommended.
Another branch of the package concerns simulating data from data. For this the package has two functions, respectively simEventCox and simEventObj. The first function takes as input Cox regressions and simulates data using these, while the latter takes as input a general model equipped with a predict2 method.
Once data is simulated the package includes functions for plotting (plotEventData) and formatting (IntFormatData).
One can perform interventions using the function alphaSimDisease and calculate the effect of interventions using intEffectAlphaDisease.
Below is a table attempting to provide an overview of the package functions. Each function in the package is described and an example is provided.
Functions relating to simEventData
| Function | Description | Example | Example |
|---|---|---|---|
simEventData() | Simulates event history data in the most general setting using Cox proportional hazard model and Weibull hazards. | N, beta, eta, nu | simEventData(N = 5000) |
simSurvData() | Simulates survival data. | N, beta, eta, nu, cens | simSurvData(N) |
simCRdata() | Simulates data from a competing risk setting with two competing causes. | N, beta, eta, nu, cens | simCRdata(N) |
simDisease() | The function simulates health care data from a setting where patients can experience different events: Censoring, Death, Disease. The effects of the various process on one another can be specified by the arguments beta_X_Y. | N, beta_L0_L | simDisease(N = 100, beta_L0_L = 1) |
simTreatment() | Simulates event history data with four types of events representing censoring (0), death (1), treatment (2), and covariate change (3). The effects of the various process on one another can be specified by the arguments beta_X_Y. | N, beta_L_A | simTreatment(N = 100, beta_L_A = 1) |
simDropIn() | Simulates data corresponding to N individuals that are at risk for 4 or 5 events. Censoring (C), Death (D), Drop In Initiation (Z), Change in Covariate Process (L) and optionally Treatment (A). | N, beta_L_A | simDropIn(N = 100, beta_L_A = 1) |
simEventTV() | Simulates general event history data in a setting with time-varying effects. | N, beta, tv_eff, t_prime | simTreatment(N = 100, beta, tv_eff, t_prime) |
Functions for treating simulated data
| Function | Description | Key Arguments | Example |
|---|---|---|---|
IntFormatData() | Transforms data into a so called tstart-tstop format, used when fitting Cox proportional hazards using the survival package. | data, N_cols | IntFormatData(data, N_cols = 8:12) |
plotEventData() | Plots the event histories. | data, title | plotEventData(data) |
Functions relating to simulating from data
| Function | Description | Key Arguments | Example |
|---|---|---|---|
simEventCox() | Simulates new data using fitted Cox proportional hazard models | N, cox_fits, L0_old, A0_old | simEventCox(N, cox_fits, L0_old, A0_old) |
simEventObj() | Simulates new data using a general model | N, obj, list_old_vars | simEventObj(100, obj, list_old_vars = list_old_vars) |
Functions for performing interventions
| Function | Description | Key Arguments | Example |
|---|---|---|---|
alphaSimDisease() | Simulation and estimation of event history data from disease setting with shape parameter of disease process multiplied by alpha. | N, alpha, tau, years_lost | alphaSimDisease(N = 10^4, alpha = 0.5, tau = 5, years_lost = FALSE) |
alphaSimDropIn() | Simulation and estimation of event history data from Drop In setting with shape parameter of Drop In process multiplied by alpha. | N, alpha, tau, years_lost | alphaSimDropIn(N = 10, alpha = 0.5, tau = 5, years_lost = F) |
alphaSimTreat() | Simulation and estimation in treatment setting with modified shape parameter of treatment Process | N, alpha, tau, years_lost | alphaSimTreat(N = 10, alpha = 0.5, tau = 5, years_lost = F) |
intEffectAlphaDisease() | Simulates data from the disease setting in two scenarios. Under intervention on the shape parameter of the disease process is multiplied by , and a baseline (non-intervened) scenario.It computes the proportion of individuals who experience death or disease by a specified time in the group , optionally returning years_lost. | N, alpha, tau, years_lost | intEffectAlphaDisease(N = 10, alpha = 0.5, tau = 5, years_lost = F) |
intEffectAlphaDropIn() | Does the same as above, just in the Drop In setting. | N, alpha, tau | intEffectAlphaDropIn(N = 10, alpha = 0.5, tau = 5) |
intEffectAlphaTreat() | Does the same as above, just in the Treatment setting. | N, alpha, tau, years_lost | intEffectAlphaTreat(N = 10, alpha = 0.5, tau = 5, years_lost = F) |
Example 1: General Event History Simulation with simEventData
This is an example of simulating data using simEventData.simEventData is a function for general event history simulations. The function is quite flexible, and does therefore take many arguments. You can read about the different arguments on the help page
?simEventData
The number of simulated events corresponds to the length of the vectors , , or the number of columns in .
Below, we create a 9x5 beta matrix for 5 event processes and 4 baseline covariates:
set.seed(736)
beta <- matrix(rnorm(9*5), ncol = 5, nrow = 9)
You can add baseline covariates by supplying generator functions:
func1 <- function(N) rbinom(N, 1, 0.2)
func2 <- function(N) rnorm(N)
add_cov <- list("Z1" = func1, "Z2" = func2)
Define an function to specify risk windows for recurrent events:
# at risk function
at_risk <- function(events) {
return(c(
1,1, # Always at risk for event 0 and 1
as.numeric(events[3] < 2), # Event 2 can occur twice
as.numeric(events[4] < 1), # Event 3 can occur once
as.numeric(events[5] < 2))) # Event 4 can occur twice
}
Optionally one can define an function to specify risk windows as a function of covariates:
# at risk function
at_risk_cov <- function(covariates) {
return(c(
1,1, # The covariates do not change the risk of event 0 and 1
as.numeric(covariates[1] < 0.5), # Only at risk of event 2 if L0 < 0.5
as.numeric(covariates[2] == 1), # Only at risk of event 3 if A0 = 1
1)) # The covariates do not change the risk of event 4
}
Simulate data for 5000 individuals:
set.seed(973)
data <- simEventData(N = 5000, beta = beta, add_cov = add_cov, at_risk = at_risk,
at_risk_cov = at_risk_cov)
Preview the simulated data:
head(data)
#> Key: <ID>
#> ID Time Delta L0 A0 Z1 Z2 N0 N1 N2
#> <int> <num> <int> <num> <num> <num> <num> <num> <num> <num>
#> 1: 1 0.5067418 4 0.0482511 1 0 0.9036369 0 0 0
#> 2: 1 0.7741479 0 0.0482511 1 0 0.9036369 1 0 0
#> 3: 2 0.7952969 1 0.2186699 0 0 1.0631460 0 1 0
#> 4: 3 4.3490086 1 0.5021846 0 0 -0.2266280 0 1 0
#> 5: 4 0.1689083 0 0.4053148 1 0 1.2534397 1 0 0
#> 6: 5 1.5183445 3 0.3986500 1 0 -0.9204445 0 0 0
#> N3 N4
#> <num> <num>
#> 1: 0 1
#> 2: 0 1
#> 3: 0 0
#> 4: 0 0
#> 5: 0 0
#> 6: 1 0
Formatting Data for Cox Regression
Transform the data into tstart-tstop format with IntFormatData (specify columns with counting process data):
data_int <- IntFormatData(data, N_cols = 8:12)
head(data_int)
#> ID Time Delta L0 A0 Z1 Z2 N0 N1 N2
#> <int> <num> <int> <num> <num> <num> <num> <num> <num> <num>
#> 1: 1 0.5067418 4 0.0482511 1 0 0.9036369 0 0 0
#> 2: 1 0.7741479 0 0.0482511 1 0 0.9036369 0 0 0
#> 3: 2 0.7952969 1 0.2186699 0 0 1.0631460 0 0 0
#> 4: 3 4.3490086 1 0.5021846 0 0 -0.2266280 0 0 0
#> 5: 4 0.1689083 0 0.4053148 1 0 1.2534397 0 0 0
#> 6: 5 1.5183445 3 0.3986500 1 0 -0.9204445 0 0 0
#> N3 N4 k tstart tstop
#> <num> <num> <int> <num> <num>
#> 1: 0 0 1 0.0000000 0.5067418
#> 2: 0 1 2 0.5067418 0.7741479
#> 3: 0 0 1 0.0000000 0.7952969
#> 4: 0 0 1 0.0000000 4.3490086
#> 5: 0 0 1 0.0000000 0.1689083
#> 6: 0 0 1 0.0000000 1.5183445
Fit Cox models for processes 0 and 4:
library(survival)
# Process 0
survfit0 <- coxph(Surv(tstart, tstop, Delta == 0) ~ L0 + A0 + Z1 + Z2 + N2 + N3 + N4,
data = data_int)
# Process 4
survfit4 <- coxph(Surv(tstart, tstop, Delta == 4) ~ L0 + Z1 + Z2 + N2 + N3 + N4,
data = data_int[(N4 < 2) & (A0 == 1)])
Visualize estimated coefficients alongside true values:
CIs <- cbind("Par" = c(paste0(rownames(confint(survfit0)), "_N0"),
paste0(rownames(confint(survfit4)), "_N4")),
rbind(confint(survfit0), confint(survfit4)))
rownames(CIs) <- NULL
colnames(CIs) <- c("Par", "Lower", "Upper")
CIs <- data.table(CIs)
CIs[, True_val := c(beta[-c(5,6),1],
beta[-c(2,5,6),5])]
CIs$Lower <- as.numeric(CIs$Lower)
CIs$Upper <- as.numeric(CIs$Upper)
pp <- ggplot(data = CIs)+
geom_point(aes(y = Par, x = Lower))+
geom_point(aes(y = Par, x = Upper))+
geom_point(aes(y = Par, x = True_val, col = "Estimate"))+
scale_color_manual(values = c("Estimate" = "red"))+
geom_segment(aes(y = Par, yend = Par, x = Lower, xend = Upper))+
xlab("Estimates")
pp

Visualize Event Histories
Plot the first 100 individuals’ event histories:
plotEventData(data[1:100,])

Using override_beta
You can override specific entries in the beta matrix. For example, set the effect of on processes and to zero:
data <- simEventData(N = 1000, beta = beta, add_cov = add_cov, at_risk = at_risk,
override_beta = list("L0" = c("N0" = 0, "N1" = 0)))
You can also specify non-linear effects, e.g., effect of on set to zero if :
data <- simEventData(N = 1000, beta = beta, add_cov = add_cov, at_risk = at_risk,
override_beta = list("N2 > 1" = c("N1" = 2)))
Additionally the argument can be used to specify interaction effects, e.g., if we want an interaction effect of 1 between L0 and N1 on N2 we can specify the argument:
data <- simEventData(N = 1000, beta = beta, add_cov = add_cov, at_risk = at_risk,
override_beta =list("L0 * N1" = c("N2" = 1)))
Example 2: Survival Data with simSurvData
The function simSurvData allows you to simulate survival data for individuals at risk of both censoring (0) and an event (1).
By default, the function simulates survival data with no covariate effects:
data <- simSurvData(100)
plotEventData(data, title = "Survival Data")

You can specify how baseline covariates and affect the risk of censoring and death through the matrix. For example:
# Effects of L0 and A0 on the censoring hazard (first column)
beta_C <- c(0, 0)
# Effects of L0 and A0 on the death hazard (second column)
beta_D <- c(1, -1)
# Combine into a 2x2 matrix where columns represent censoring and death hazards
beta <- cbind(beta_C, beta_D)
You can also set the Weibull shape () and scale () parameters for both censoring and death hazards:
eta <- c(0.2, 0.2)
nu <- c(1.05, 1.05)
Use the specified parameters to simulate data and visualize it:
data <- simSurvData(100, beta = beta, eta = eta, nu = nu)
plotEventData(data, title = "Survival Data")

Example 3: Competing Risk Data with simCRdata
Simulate competing risk data (similar parameters as above):
data <- simCRdata(100)
plotEventData(data, title = "Competing Risk Data")

Example 4: Health Care Data with simDisease
The function simDisease simulates health care data from a setting where patients can experience $3$ different events:
- Censoring (coded as 0)
- Death (coded as 1), and
- Disease (coded as 2).
You can customize the simulation scenarios by adjusting the function arguments. For detailed information about the parameters, see the help page.
Example simulation
data <- simDisease(N = 100,
cens = 1,
eta = c(0.1,0.3,0.1),
nu = c(1.1,1.3,1.1),
beta_L0_L = 1,
beta_A0_L = -1.1,
beta_L_D = 1,
beta_L0_D = 0)
plotEventData(data, title = "Disease data")

Example 5: Health Care Data with simTreatment
Simulate data with a covariate process and treatment process. Default simulation:
data <- simTreatment(100)
Custom scenario:
data <- simTreatment(N = 100,
beta_L_A = 1,
beta_L_D = 1,
beta_A_D = -1,
beta_A_L = -0.5,
beta_L0_A = 1,
eta = rep(0.1, 4),
nu = rep(1.1, 4),
followup = 5,
cens = 1,
op = 1)
Plot
plotEventData(data, title = "Treatment setting")

Example 6: Time Varying Effects with simEventTV
One can simulate a setting with time-varying effects with the function simEventTV. After the time an additional effect of is added to the beta matrix.
set.seed(6258)
eta <- rep(0.1, 4)
term_deltas <- c(0,1)
tv_eff <- matrix(0.5, ncol = 4, nrow = 6)
beta <- matrix(nrow = 6, ncol = 4, 0.5)
t_prime <- 1
data <- simEventTV(N = 100, t_prime = t_prime, tv_eff = tv_eff, eta = eta,
term_deltas = term_deltas, beta = beta, lower = 10^(-15), upper = 10^2,
max_events = 5)
plotEventData(data)

Example 7: Simulating data from data
Imagine we have observed data:
set.seed(373)
beta = matrix(c(0.5,-1,-0.5,0.5,0,0.5), ncol = 3, nrow = 2)
data <- simCRdata(N = 1000, beta = beta)
Using simEventCox
If we want to use simEventCox for simulating new data from the observed data, we first need to fit Cox models:
cox1 <- coxph(Surv(Time, Delta == 1) ~ L0 + A0, data = data)
cox2 <- coxph(Surv(Time, Delta == 2) ~ L0 + A0, data = data)
Then we can call the function simEventCox providing the fitted Cox models as arguments:
cox_fits <- list("Process 1" = cox1, "Process 2" = cox2)
new_data <- simEventCox(100, cox_fits, L0_old = data$L0, A0_old = data$A0)
The new simulated data looks like:
head(new_data)
#> Key: <ID>
#> ID Time Delta L0 A0 Process 1 Process 2
#> <int> <num> <int> <num> <num> <num> <num>
#> 1: 1 2.175382 1 0.5363954 0 1 0
#> 2: 2 7.139631 1 0.3144497 0 1 0
#> 3: 3 2.766344 1 0.9335247 1 1 0
#> 4: 4 3.570241 2 0.2502356 0 0 1
#> 5: 4 12.695609 1 0.2502356 0 1 1
#> 6: 5 5.055418 1 0.5462261 1 1 0
Using simEventObj
If we want to use simEventObj for simulating new data from the observed data, we need to fit a general model to data and equip it with a predict2 method. We can for example fit a random forest model using rfsrc from the randomForestSRC package:
RF_fit <- randomForestSRC::rfsrc(Surv(Time, Delta) ~ L0 + A0, data = data)
Then we equip it with a predict2 model
predict2 <- function(obj, ...) {
UseMethod("predict2")
}
predict2.rfsrc <- function(obj, sim_data, ...){
preds <- stats::predict(RF_fit, sim_data)
list(time = preds$time.interest, chf = preds$chf)
}
Then we can call the function simEventObj providing the fitted Random Forest as argument:
list_old_vars = list(L0 = data$L0, A0 = data$A0)
new_data <- simEventObj(100, RF_fit, list_old_vars = list_old_vars)
The new simulated data looks like:
head(new_data)
#> Key: <ID>
#> ID Time Delta L0 A0 N1 N2
#> <int> <num> <int> <num> <num> <num> <num>
#> 1: 1 1.169271 1 0.40253050 1 1 0
#> 2: 2 1.433632 2 0.80094849 0 0 1
#> 3: 3 6.062924 2 0.03707728 0 0 1
#> 4: 4 0.310825 1 0.84810211 1 1 0
#> 5: 5 1.465128 2 0.99071345 1 0 1
#> 6: 6 5.116592 2 0.98173073 1 0 1
Example 8: Interventions
Say you want to perform an intervention in the disease setting by scaling the intensity of one of your counting processes. Data without intervention can be simulated by scaling with the factor 1:
alphaSimDisease(N = 10^3, alpha = 1, tau = 5)
#> $effect_D
#> [1] 0.6223092
#>
#> $effect_L
#> [1] 0.3424658
By default the function returns the proportion of individuals who experience death and the proportion of individuals who experience disease by a specified time in group . The user can specify what gets returned by the function call with use of the arguments years_lost and return_data. If the former is set to TRUE the function returns number of years lost before of death and disease. If the latter is set to TRUE the function call returns the simulated data.
To compare a non intervened scenario with a non intervened scenario, the function intEffectAlphaTreat can be used. This function simulates data from the disease setting in two scenarios. Under the intervention on the shape parameter of the disease process, which is multiplied by , and a baseline (non-intervened) scenario. It computes the proportion of individuals who experience death or disease by a specified time in the group , optionally returning years lost. The function can also plot a sample of the event data for each scenario for comparison.
intEffectAlphaDisease(N = 1000, alpha = 0.7, tau = 5, years_lost = TRUE, a0 = 1, plot = TRUE)
#> $effect_L
#> [1] 0.8363509
#>
#> $effect_death
#> [1] 1.466341