Grammar of Graphics for Linear Model Diagnostic Plots.
gglm
Overview
gglm
, The Grammar of Graphics for Linear Model Diagnostics, is an R package and official ggplot2
extension that creates beautiful diagnostic plots using ggplot2
for a variety of model objects. These diagnostic plots are easy to use and adhere to the Grammar of Graphics. The purpose of this package is to provide a sensible alternative to using the base-R plot()
function to produce diagnostic plots for model objects. Currently, gglm
supports all model objects that are supported by broom::augment()
, broom.mixed::augment()
, or ggplot2::fortify()
. For example, objects that are outputted from stats::lm()
, lme4::lmer()
, brms::brm()
, and many other common modeling functions will work with gglm
. The function gglm::list_model_classes()
provides a full list of model classes supported by gglm
.
Installation
gglm
can be installed from CRAN:
install.packages("gglm")
Or, the developmental version of gglm
can be installed from GitHub:
devtools::install_github("graysonwhite/gglm")
Examples
gglm
has two main types of functions. First, the gglm()
function is used for quickly creating the four main diagnostic plots, and behaves similarly to how plot()
works on an lm
type object. Second, the stat_*()
functions are used to produce diagnostic plots by creating ggplot2
layers. These layers allow for plotting of particular model diagnostic plots within the ggplot2
framework.
Example 1: Quickly creating the four diagnostic plots with gglm()
Consider a simple linear model used to predict miles per gallon with weight. We can fit this model with lm()
, and then diagnose it easily by using gglm()
.
library(gglm) # Load gglm
model <- lm(mpg ~ wt, data = mtcars) # Fit the simple linear model
gglm(model) # Plot the four main diagnostic plots
Now, one may be interested in a more complicated model, such as a mixed model with a varying intercept on cyl
, fit with the lme4
package. Luckily, gglm
accommodates a variety of models and modeling packages, so the diagnostic plots for the mixed model can be created in the same way as they were for the simple linear model.
library(lme4) # Load lme4 to fit the mixed model
mixed_model <- lmer(mpg ~ wt + (1 | cyl), data = mtcars) # Fit the mixed model
gglm(mixed_model) # Plot the four main diagnostic plots.
Example 2: Using the Grammar of Graphics with the stat_*()
functions
gglm
also provides functionality to stay within the Grammar of Graphics by providing functions that can be used as ggplot2
layers. An example of one of these functions is the stat_fitted_resid()
function. With this function, we can take a closer look at just the fitted vs. residual plot from the mixed model fit in Example 1.
ggplot(data = mixed_model) +
stat_fitted_resid()
After taking a closer look, we may want to clean up the look of the plot for a presentation or a project. This can be done by adding other layers from ggplot2
to the plot. Note that any ggplot2
layers can be added on to any of the stat_*()
functions provided by gglm
.
ggplot(data = mixed_model) +
stat_fitted_resid(alpha = 1) +
theme_bw() + # add a clean theme
labs(title = "Residual vs fitted values for the mixed model") + # change the title
theme(plot.title = element_text(hjust = 0.5)) # center the title
Wow! What a beautiful and production-ready diagnostic plot!
Functions
For quick and easy plotting
gglm()
plots the four default diagnostic plots when supplied a model object (this is similar to plot.lm()
in the case of an object generated by lm()
). Note that can gglm()
take many types of model object classes as its input, and possible model object classes can be seen with list_model_classes()
Following the Grammar of Graphics
stat_normal_qq()
, stat_fitted_resid()
, stat_resid_hist()
, stat_scale_location()
, stat_cooks_leverage()
, stat_cooks_obs()
, and stat_resid_leverage()
all are ggplot2
layers used to create individual diagnostic plots. To use these, follow Example 2.
Other functions
list_model_classes()
lists the model classes compatible with gglm
.