Automatic Linear and Logistic Regression and Survival Analysis.
Installation
You can install autoReg package on github.
#install.packages("devtools")
devtools::install_github("cardiomoon/autoReg")
Load package
To load the package, use library() function.
library(autoReg)
Main features
1.Summarizing baseline characteristics : gaze()
You can make a table summarizing baseline characteristics easily.
library(moonBook) # For use of example data acs
gaze(sex~.,data=acs)
————————————————————————————————————————————————————————————————————————
Dependent:sex levels Female Male p
(N) (N=287) (N=570)
————————————————————————————————————————————————————————————————————————
age Mean ± SD 68.7 ± 10.7 60.6 ± 11.2 <.001
cardiogenicShock No 275 (95.8%) 530 (93%) .136
Yes 12 (4.2%) 40 (7%)
entry Femoral 119 (41.5%) 193 (33.9%) .035
Radial 168 (58.5%) 377 (66.1%)
Dx NSTEMI 50 (17.4%) 103 (18.1%) .012
STEMI 84 (29.3%) 220 (38.6%)
Unstable Angina 153 (53.3%) 247 (43.3%)
EF Mean ± SD 56.3 ± 10.1 55.6 ± 9.4 .387
height Mean ± SD 153.8 ± 6.2 167.9 ± 6.1 <.001
weight Mean ± SD 57.2 ± 9.3 68.7 ± 10.3 <.001
BMI Mean ± SD 24.2 ± 3.6 24.3 ± 3.2 .611
obesity No 194 (67.6%) 373 (65.4%) .580
Yes 93 (32.4%) 197 (34.6%)
TC Mean ± SD 188.9 ± 51.1 183.3 ± 45.9 .124
LDLC Mean ± SD 117.8 ± 41.2 116.0 ± 41.1 .561
HDLC Mean ± SD 39.0 ± 11.5 37.8 ± 10.9 .145
TG Mean ± SD 119.9 ± 76.2 127.9 ± 97.3 .195
DM No 173 (60.3%) 380 (66.7%) .077
Yes 114 (39.7%) 190 (33.3%)
HBP No 83 (28.9%) 273 (47.9%) <.001
Yes 204 (71.1%) 297 (52.1%)
smoking Ex-smoker 49 (17.1%) 155 (27.2%) <.001
Never 209 (72.8%) 123 (21.6%)
Smoker 29 (10.1%) 292 (51.2%)
————————————————————————————————————————————————————————————————————————
For easy reproducible research : myft()
You can make a publication-ready table easily using myft(). It makes a flextable object which can use in either HTML and PDF format.
library(dplyr) # for use of `%>%`
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
ft=gaze(sex~.,data=acs) %>% myft()
ft
You can also make a powerpoint file using rrtable::table2pptx() function.
library(rrtable)
table2pptx(ft)
Exported table as Report.pptx
You can make a microsoft word file using rrtable::table2docx() function.
table2docx(ft)
Exported table as Report.docx
Summarizing baseline characteristics with two or more grouping variables
You can get a table summarizing baseline characteristics with two or more grouping variables.
gaze(sex+Dx~.,data=acs) %>% myft()
You can also use three or more grouping variables.The resultant table will be too long to review, but you can try.
gaze(sex+DM+HBP~age,data=acs) %>% myft()
2. For automatic selection of explanatory variables : autoReg()
You can make a table summarizing results of regression analysis. For example, let us perform a logistic regression with the colon cancer data.
library(survival) # For use of data colon
data(cancer)
fit=glm(status~rx+sex+age+obstruct+perfor+nodes,data=colon,family="binomial")
summary(fit)
Call:
glm(formula = status ~ rx + sex + age + obstruct + perfor + nodes,
family = "binomial", data = colon)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.4950 -1.0594 -0.7885 1.1619 1.6424
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.645417 0.285558 -2.260 0.0238 *
rxLev -0.067422 0.118907 -0.567 0.5707
rxLev+5FU -0.627480 0.121684 -5.157 2.51e-07 ***
sex -0.053541 0.098975 -0.541 0.5885
age 0.002307 0.004234 0.545 0.5859
obstruct 0.283703 0.125194 2.266 0.0234 *
perfor 0.319281 0.292034 1.093 0.2743
nodes 0.190563 0.018255 10.439 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2525.4 on 1821 degrees of freedom
Residual deviance: 2342.4 on 1814 degrees of freedom
(36 observations deleted due to missingness)
AIC: 2358.4
Number of Fisher Scoring iterations: 4
You can make table with above result.
autoReg(fit)
——————————————————————————————————————————————————————————————————————————————————
Dependent: status 0 (N=925) 1 (N=897) OR (multivariable)
——————————————————————————————————————————————————————————————————————————————————
rx Obs 282 (30.5%) 342 (38.1%)
Lev 285 (30.8%) 323 (36%) 0.93 (0.74-1.18, p=.571)
Lev+5FU 358 (38.7%) 232 (25.9%) 0.53 (0.42-0.68, p<.001)
sex Mean ± SD 0.5 ± 0.5 0.5 ± 0.5 0.95 (0.78-1.15, p=.589)
age Mean ± SD 60.1 ± 11.5 59.5 ± 12.3 1.00 (0.99-1.01, p=.586)
obstruct Mean ± SD 0.2 ± 0.4 0.2 ± 0.4 1.33 (1.04-1.70, p=.023)
perfor Mean ± SD 0.0 ± 0.2 0.0 ± 0.2 1.38 (0.78-2.47, p=.274)
nodes Mean ± SD 2.7 ± 2.4 4.6 ± 4.2 1.21 (1.17-1.26, p<.001)
——————————————————————————————————————————————————————————————————————————————————
Or you can make a publication-ready table.
autoReg(fit) %>% myft()
If you want make a table with more explanation, you can make categorical variables with numeric variables. For example, the explanatory variables obstruct(obstruction of colon by tumor) and perfor(perforation of colon) is coded as 0 or 1, but it is “No” or “Yes” actually. Also the dependent variable status is coded as 0 or 1, it is “Alive” or “Died”.
colon$status.factor=factor(colon$status,labels=c("Alive","Died"))
colon$obstruct.factor=factor(colon$obstruct,labels=c("No","Yes"))
colon$perfor.factor=factor(colon$perfor,labels=c("No","Yes"))
colon$sex.factor=factor(colon$sex,labels=c("Female","Male"))
fit=glm(status.factor~rx+sex.factor+age+obstruct.factor+perfor.factor+nodes,data=colon,family="binomial")
result=autoReg(fit)
result %>% myft()
You can add labels to the names of variables with setLabel() function.
colon$status.factor=setLabel(colon$status.factor,"Mortality")
colon$rx=setLabel(colon$rx,"Treatment")
colon$age=setLabel(colon$age,"Age(Years)")
colon$sex.factor=setLabel(colon$sex.factor,"Sex")
colon$obstruct.factor=setLabel(colon$obstruct.factor,"Obstruction")
colon$perfor.factor=setLabel(colon$perfor.factor,"Perforation")
colon$nodes=setLabel(colon$nodes,"Positive nodes")
fit=glm(status.factor~rx+sex.factor+age+obstruct.factor+perfor.factor+nodes,data=colon,family="binomial")
result=autoReg(fit)
result %>% myft()
If you do not want to show the reference values in table, you can shorten the table.
shorten(result) %>% myft()
Add univariate models to table and automatic selection of explanatory variables
You can add the results of univariate analyses to the table. At this time, the autoReg() function automatically select explanatory variables below the threshold(default value 0.2) and perform multivariate analysis. In this table, the p values of explanatory variables sex.factor and age is above the default threshold(0.2), they are excluded in multivariate model.
autoReg(fit, uni=TRUE) %>% myft()
If you want to include all explanatory variables in the multivariate model, just set the threshold 1.
autoReg(fit, uni=TRUE,threshold=1) %>% myft()
You can perform stepwise backward elimination to select variables and make a final model. Just set final=TRUE.
autoReg(fit, uni=TRUE,threshold=1, final=TRUE) %>% myft()
Multiple imputation with mice()
When the argument imputed=TRUE, autoReg() function make a multiple imputed model using mice::mice() function. By default, 20 imputations performed. If you want, you can change the number of imputations with m argument.
autoReg(fit, imputed=TRUE) %>% myft()
Summarize regression model results in a plot : modelPlot()
You can draw the plot summarizing the model with modelPlot()
x=modelPlot(fit)
x
You can make powerpoint file with this plot using rrtable::plot2pptx().
plot2pptx(print(x))
Exported plot as Report.pptx
You can summarize models in a plot. If you want to summarize univariate and multivariate model in a plot, just set the uni=TRUE and adjust the threshold. You can decide whether or not show the reference by show.ref argument.
modelPlot(fit,uni=TRUE,threshold=1,show.ref=FALSE)