Data Package for Medical Datasets.
medicaldata
Overview
This is a data package with 15 medical datasets for teaching Reproducible Medical Research with R. The link to the pkgdown reference website for {medicaldata} is here and in the links at the right. This package will be useful for anyone teaching R to medical professionals, including doctors, nurses, trainees, and students.
These datasets range from reconstructed versions of James Lind’s scurvy dataset (1757) and the original Streptomycin for Tuberculosis trial (1948), a 2012 RCT of indomethacin to prevent post-ERCP pancreatitis that I was involved in, to cohort data on SARS-CoV2 testing results (2020). Many of the datasets come from the American Statistical Association’s TSHS (Teaching Statistics in the Health Sciences) Resources Portal, maintained by Carol Bigelow at the University of Massachusetts (with permission).
How to Install and Use {medicaldata} Datasets
Install with:
remotes::install_github("higgi13425/medicaldata")
Then load the package with
library(medicaldata)
Then you can list the datasets available with
data(package = "medicaldata")
Then assign a particular dataset to a named object in your environment with:
`covid <- medicaldata::covid_testing` <br> wherecovid
is the name of the new object, andcovid_testing
is the name of the dataset.Articles (vignettes) on how to use the datasets can be found at the pkgdown website under the Articles tab.
You can click on the links below to view the codebook and/or description document for each dataset. This information is also available under the Reference tab above, or within R by using
help(dataset_name)
.
Data Donations
If you have access to data from a randomized, controlled clinical trial, or a prospective cohort study, or even a case-control study, please consider obtaining the appropriate permissions, anonymizing the data, and donating the dataset for teaching purposes to add to this package. Open an issue to open the discussion of a data donation.
List of Datasets
Click on links below for more details about the dataset itself in the Description Document, and more details about the variables included in the dataset in the Codebook. Note that each dataset also has a help file that you can use within R or RStudio, by entering help("dataset_name")
in the Console pane.
Dataset | Description document | Codebook |
---|---|---|
strep_tb | strep_tb_desc | strep_tb_codebook |
scurvy | scurvy_desc | scurvy_codebook |
indo_rct | indo_rct_desc | indo_rct_codebook |
polyps | polyps_desc | polyps_codebook |
covid_testing | covid_desc | covid_codebook |
blood_storage | blood_storage_desc | blood_storage_codebook |
cytomegalovirus | cytomegalovirus_desc | cytomegalovirus_codebook |
esoph_ca | esoph_ca_desc | esoph_ca_codebook |
laryngoscope | laryngoscope_desc | laryngoscope_codebook |
licorice_gargle | licorice_gargle_desc | licorice_gargle_codebook |
opt | opt_desc | opt_codebook |
smartpill | smartpill_desc | smartpill_codebook |
supraclavicular | supraclavicular_desc | supraclavicular_codebook |
indometh | indometh_desc | indometh_codebook |
theoph | theoph_desc | theoph_codebook. |