Intensive Care Unit Data with R.
ricu
Working with ICU datasets, especially with publicly available ones as provided by PhysioNet in R is facilitated by ricu
, which provides data access, a level of abstraction to encode clinical concepts in a data source agnostic way, as well as classes and utilities for working with the arising types of time series datasets.
Installation
Currently, installation is only possible from github directly, using the remotes
if installed
remotes::install_github("eth-mds/ricu")
or by sourcing the required code for installation from github by running
rem <- source(
paste0("https://raw.githubusercontent.com/r-lib/remotes/main/",
"install-github.R")
)
rem$value("eth-mds/ricu")
In order to make sure that some useful utility packages are installed as well, consider installing the packages marked as Suggests
as well by running
remotes::install_github("eth-mds/ricu", dependencies = TRUE)
instead, or by installing some of the utility packages (relevant for downloading and preprocessing PhysioNet datasets)
install.packages("xml2")
and demo dataset packages
install.packages(c("mimic.demo", "eicu.demo"),
repos = "https://eth-mds.github.io/physionet-demo")
explicitly.
Data access
Out of the box (provided the two data packages mimic.demo
and eicu.demo
are available), ricu
provides access to the demo datasets corresponding to the PhysioNet Clinical Databases eICU and MIMIC-III. Tables are available as
mimic_demo$admissions
#> # <mimic_tbl>: [129 ✖ 19]
#> # ID options: subject_id (patient) < hadm_id (hadm) < icustay_id (icustay)
#> # Defaults: admission_type
(val)
#> # Time vars: admittime
, dischtime
, deathtime
, edregtime
, edouttime
#> row_id subject_id hadm_id admittime dischtime
#> <int> <int> <int> <dttm> <dttm>
#> 1 12258 10006 142345 2164-10-23 21:09:00 2164-11-01 17:15:00
#> 2 12263 10011 105331 2126-08-14 22:32:00 2126-08-28 18:59:00
#> 3 12265 10013 165520 2125-10-04 23:36:00 2125-10-07 15:13:00
#> 4 12269 10017 199207 2149-05-26 17:19:00 2149-06-03 18:42:00
#> 5 12270 10019 177759 2163-05-14 20:43:00 2163-05-15 12:00:00
#> …
#> 125 41055 44083 198330 2112-05-28 15:45:00 2112-06-07 16:50:00
#> 126 41070 44154 174245 2178-05-14 20:29:00 2178-05-15 09:45:00
#> 127 41087 44212 163189 2123-11-24 14:14:00 2123-12-30 14:31:00
#> 128 41090 44222 192189 2180-07-19 06:55:00 2180-07-20 13:00:00
#> 129 41092 44228 103379 2170-12-15 03:14:00 2170-12-24 18:00:00
#> # ℹ 124 more rows
#> # ℹ 14 more variables: deathtime <dttm>, admission_type <chr>,
#> # admission_location <chr>, discharge_location <chr>, insurance <chr>,
#> # language <chr>, religion <chr>, marital_status <chr>, ethnicity <chr>,
#> # edregtime <dttm>, edouttime <dttm>, diagnosis <chr>,
#> # hospital_expire_flag <int>, has_chartevents_data <int>
and data can be loaded into an R session for example using
load_ts("labevents", "mimic_demo", itemid == 50862L,
cols = c("valuenum", "valueuom"))
#> # A ts_tbl
: 299 ✖ 4
#> # Id var: icustay_id
#> # Index var: charttime
(1 hours)
#> icustay_id charttime valuenum valueuom
#> <int> <drtn> <dbl> <chr>
#> 1 201006 0 hours 2.4 g/dL
#> 2 203766 -18 hours 2 g/dL
#> 3 203766 4 hours 1.7 g/dL
#> 4 204132 7 hours 3.6 g/dL
#> 5 204201 9 hours 2.3 g/dL
#> …
#> 295 298685 130 hours 1.9 g/dL
#> 296 298685 154 hours 2 g/dL
#> 297 298685 203 hours 2 g/dL
#> 298 298685 272 hours 2.2 g/dL
#> 299 298685 299 hours 2.5 g/dL
#> # ℹ 294 more rows
which returns time series data as ts_tbl
object.
Acknowledgments
This work was supported by grant #2017-110 of the Strategic Focal Area “Personalized Health and Related Technologies (PHRT)” of the ETH Domain for the SPHN/PHRT Driver Project “Personalized Swiss Sepsis Study”.