Summarise and Visualise Characteristics of Patients in the OMOP CDM.
CohortCharacteristics
Package overview
CohortCharacteristics contains functions for summarising characteristics of cohorts of patients identified in an OMOP CDM dataset. Once a cohort table has been created, CohortCharacteristics provides a number of functions to help provide a summary of the characteristics of the individuals within the cohort.
Package installation
You can install the latest version of CohortCharacteristics from CRAN:
install.packages("CohortCharacteristics")
Or from github:
install.packages("remotes")
remotes::install_github("darwin-eu-dev/CohortCharacteristics")
Example usage
The CohortCharacteristics package is designed to work with data in the OMOP CDM format, so our first step is to create a reference to the data using the CDMConnector package. For this example we will work with the example Eunomia dataset.
library(CDMConnector)
library(CohortCharacteristics)
library(dplyr)
cdm <- mockCohortCharacteristics(patient_size = 1000, drug_exposure_size = 1000)
cdm
We can see that in this example data we have a cohort table called cohort1.
cdm$cohort1
#> # Source: table<main.cohort1> [4 x 4]
#> # Database: DuckDB v0.10.0 [martics@Windows 10 x64:R 4.2.1/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <dbl> <dbl> <date> <date>
#> 1 1 1 2020-01-01 2020-04-01
#> 2 1 1 2020-06-01 2020-08-01
#> 3 1 2 2020-01-02 2020-02-02
#> 4 2 3 2020-01-01 2020-03-01
With one line of code from CohortCharacteristics we can generate summary statistics on this cohort.
cohort1_characteristics <- summariseCharacteristics(cdm$cohort1)
cohort1_characteristics |>
glimpse()
#> Rows: 70
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "PP_MOCK", "PP_MOCK", "PP_MOCK", "PP_MOCK", "PP_MOCK"…
#> $ group_name <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level <chr> "cohort_1", "cohort_2", "cohort_1", "cohort_2", "coho…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "Number records", "Number records", "Number subjects"…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name <chr> "count", "count", "count", "count", "min", "min", "q2…
#> $ estimate_type <chr> "integer", "integer", "integer", "integer", "date", "…
#> $ estimate_value <chr> "3", "1", "2", "1", "2020-01-01", "2020-01-01", "2020…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
And with another line we can create a table of these results.
tableCharacteristics(cohort1_characteristics, type = "tibble")
#> # A tibble: 17 × 6
#> `CDM name` `Variable name` `Variable level` `Estimate name`
#> <chr> <chr> <chr> <chr>
#> 1 PP_MOCK Number records <NA> N
#> 2 PP_MOCK Number subjects <NA> N
#> 3 PP_MOCK Cohort start date <NA> Median [Q25 - Q75]
#> 4 PP_MOCK Cohort start date <NA> Range
#> 5 PP_MOCK Cohort end date <NA> Median [Q25 - Q75]
#> 6 PP_MOCK Cohort end date <NA> Range
#> 7 PP_MOCK Sex Female N (%)
#> 8 PP_MOCK Sex Male N (%)
#> 9 PP_MOCK Age <NA> Median [Q25 - Q75]
#> 10 PP_MOCK Age <NA> Mean (SD)
#> 11 PP_MOCK Age <NA> Range
#> 12 PP_MOCK Prior observation <NA> Median [Q25 - Q75]
#> 13 PP_MOCK Prior observation <NA> Mean (SD)
#> 14 PP_MOCK Prior observation <NA> Range
#> 15 PP_MOCK Future observation <NA> Median [Q25 - Q75]
#> 16 PP_MOCK Future observation <NA> Mean (SD)
#> 17 PP_MOCK Future observation <NA> Range
#> # ℹ 2 more variables: `[header]Cohort name\n[header_level]Cohort 1` <chr>,
#> # `[header]Cohort name\n[header_level]Cohort 2` <chr>
CohortCharacteristics provides a number of other functions to help summarise cohort tables and present the results in publication-ready tables and figures. See the vignettes for more details.