MyNixOS website logo
Description

Epidemiology Data Dictionaries and Random Data Generators.

The 'R4EPIs' project <https://r4epi.github.io/sitrep/> seeks to provide a set of standardized tools for analysis of outbreak and survey data in humanitarian aid settings. This package currently provides standardized data dictionaries from Medecins Sans Frontieres Operational Centre Amsterdam for outbreak scenarios (Acute Jaundice Syndrome, Cholera, Diphtheria, Measles, Meningitis) and surveys (Retrospective mortality and access to care, Malnutrition, Vaccination coverage and Event Based Surveillance) - as described in the following <https://scienceportal.msf.org/assets/standardised-mortality-surveys?utm_source=chatgpt.com>. In addition, a data generator from these dictionaries is provided. It is also possible to read in any Open Data Kit format data dictionary.

epidict

Lifecycle:maturing CRANstatus R-CMD-check Codecov testcoverage

The goal of {epidict} is to provide standardized data dictionaries for use in epidemiological data analysis templates. Currently it supports standardised dictionaries from MSF OCA. This is a product of the R4EPIs project; learn more at https://r4epi.github.io/sitrep/

Installation

You can install {epidict} from CRAN:

install.packages("epidict")

Click here for alternative installation options

If there is a bugfix or feature that is not yet on CRAN, you can install it via the {drat} package:

You can also install the in-development version from GitHub using the {remotes} package (but there’s no guarantee that it will be stable):

# install.packages("remotes")
remotes::install_github("R4EPI/epidict") 

Accessing dictionaries

The dictionaries can be obtained via the msf_dict() function, which specifies a variable and their possible options (if categorical).

There are MSF intersectional outbreak dictionaries available in {epidict} based on ODK exports.

There are MSF OCA outbreak dictionaries available in {epidict} based on DHIS2 exports. > You can read more about the outbreak dictionaries at https://r4epi.github.io/epidict/articles/Outbreaks.html

In addition, there are MSF survey dictionaries available based on ODK exports. > You can read more about the survey dictionaries at https://r4epi.github.io/epidict/articles/Surveys.html

You can also read in your own ODK dictionaries using read_dict().

Generating data

The {epidict} package has a function for generating data that’s called gen_data(), which takes three arguments: The dictionary, which column describes the variable names, and how many rows are needed in the output.

Click here for code examples

library("epidict")
gen_data("Measles", varnames = "data_element_shortname", numcases = 100, org = "MSF")
#> # A tibble: 100 × 52
#>    case_number date_of_consultation_admis…¹ patient_facility_type patient_origin
#>    <chr>       <date>                       <fct>                 <chr>         
#>  1 A1          2018-04-23                   OP                    Village D     
#>  2 A2          2018-02-23                   OP                    Village C     
#>  3 A3          2018-04-15                   OP                    Village C     
#>  4 A4          2018-04-30                   OP                    Village A     
#>  5 A5          2018-01-09                   IP                    Village D     
#>  6 A6          2018-03-14                   OP                    Village D     
#>  7 A7          2018-03-20                   OP                    Village D     
#>  8 A8          2018-03-23                   OP                    Village A     
#>  9 A9          2018-01-23                   IP                    Village A     
#> 10 A10         2018-03-19                   OP                    Village D     
#> # ℹ 90 more rows
#> # ℹ abbreviated name: ¹​date_of_consultation_admission
#> # ℹ 48 more variables: age_years <int>, age_months <int>, age_days <int>,
#> #   sex <fct>, pregnant <fct>, trimester <fct>,
#> #   foetus_alive_at_admission <fct>, exit_status <fct>, date_of_exit <date>,
#> #   time_to_death <fct>, pregnancy_outcome_at_exit <fct>,
#> #   baby_born_with_complications <fct>, previously_vaccinated <fct>, …
gen_data("Vaccination_long", varnames = "name", numcases = 100, org = "MSF")
#> # A tibble: 100 × 120
#>    start end   today deviceid date       team_number village_name village_other
#>    <lgl> <lgl> <lgl> <lgl>    <date>     <lgl>       <fct>        <lgl>        
#>  1 NA    NA    NA    NA       2018-02-11 NA          village_4    NA           
#>  2 NA    NA    NA    NA       2018-03-24 NA          village_2    NA           
#>  3 NA    NA    NA    NA       2018-02-02 NA          village_9    NA           
#>  4 NA    NA    NA    NA       2018-02-20 NA          village_3    NA           
#>  5 NA    NA    NA    NA       2018-04-09 NA          village_5    NA           
#>  6 NA    NA    NA    NA       2018-01-27 NA          village_7    NA           
#>  7 NA    NA    NA    NA       2018-03-12 NA          village_6    NA           
#>  8 NA    NA    NA    NA       2018-04-11 NA          village_3    NA           
#>  9 NA    NA    NA    NA       2018-04-28 NA          village_3    NA           
#> 10 NA    NA    NA    NA       2018-03-09 NA          other        NA           
#> # ℹ 90 more rows
#> # ℹ 112 more variables: cluster_number <dbl>, household_number <int>,
#> #   households_building <int>, random_hh <int>, consent <chr>,
#> #   no_consent_reason <fct>, no_consent_other <lgl>, caretaker_relation <fct>,
#> #   caretaker_other <lgl>, number_children <dbl>, child_number <chr>,
#> #   sex <fct>, date_birth <date>, age_years <int>, age_months <int>,
#> #   any_vaccine <fct>, vaccine_card <fct>, hf_records <fct>, …

Cleaning data with the dictionaries

You can use the dictionaries to clean the data via the {matchmaker} package:

Click here for code examples

library("matchmaker")
library("dplyr")

dat <- gen_data(dictionary = "Cholera", 
  varnames = "data_element_shortname",
  numcases = 20,
  org = "MSF"
)
print(dat)
#> # A tibble: 20 × 45
#>    case_number date_of_consultation_admiss…¹ patient_origin age_years age_months
#>    <chr>       <date>                        <chr>              <int>      <int>
#>  1 A1          2018-03-23                    Village D             18         NA
#>  2 A2          2018-01-29                    Village B             27         NA
#>  3 A3          2018-01-23                    Village D             40         NA
#>  4 A4          2018-02-02                    Village D             15         NA
#>  5 A5          2018-01-23                    Village B             28         NA
#>  6 A6          2018-01-15                    Village A             40         NA
#>  7 A7          2018-01-04                    Village B             34         NA
#>  8 A8          2018-04-25                    Village D             29         NA
#>  9 A9          2018-04-18                    Village D             45         NA
#> 10 A10         2018-01-22                    Village C             10         NA
#> 11 A11         2018-02-06                    Village A             19         NA
#> 12 A12         2018-03-03                    Village D             61         NA
#> 13 A13         2018-01-08                    Village B             20         NA
#> 14 A14         2018-03-08                    Village C             73         NA
#> 15 A15         2018-03-08                    Village C             66         NA
#> 16 A16         2018-03-18                    Village A             52         NA
#> 17 A17         2018-04-03                    Village A             59         NA
#> 18 A18         2018-01-12                    Village B             24         NA
#> 19 A19         2018-01-19                    Village D             53         NA
#> 20 A20         2018-03-07                    Village A              7         NA
#> # ℹ abbreviated name: ¹​date_of_consultation_admission
#> # ℹ 40 more variables: age_days <int>, sex <fct>, pregnant <fct>,
#> #   trimester <fct>, foetus_alive_at_admission <fct>, exit_status <fct>,
#> #   date_of_exit <date>, time_to_death <fct>, pregnancy_outcome_at_exit <fct>,
#> #   previously_vaccinated <fct>, previous_vaccine_doses_received <fct>,
#> #   readmission <fct>, msf_involvement <fct>,
#> #   cholera_treatment_facility_type <fct>, residential_status_brief <fct>, …

# We want the expanded dictionary, so we will select `compact = FALSE`
dict <- msf_dict(dictionary = "Cholera", 
  long    = TRUE,
  compact = FALSE,
  tibble  = TRUE
)
print(dict)
#> # A tibble: 182 × 11
#>    data_element_uid data_element_name                     data_element_shortname
#>    <chr>            <chr>                                 <chr>                 
#>  1 AafTlSwliVQ      egen_001_patient_case_number          case_number           
#>  2 OTGOtWBz39J      egen_004_date_of_consultation_admiss… date_of_consultation_…
#>  3 wnmMr2V3T3u      egen_006_patient_origin               patient_origin        
#>  4 sbgqjeVwtb8      egen_008_age_years                    age_years             
#>  5 eXYhovYyl61      egen_009_age_months                   age_months            
#>  6 UrYJSk2Wp46      egen_010_age_days                     age_days              
#>  7 D1Ky5K7pFN6      egen_011_sex                          sex                   
#>  8 D1Ky5K7pFN6      egen_011_sex                          sex                   
#>  9 D1Ky5K7pFN6      egen_011_sex                          sex                   
#> 10 dTm5R53YYXC      egen_012_pregnancy_status             pregnant              
#> # ℹ 172 more rows
#> # ℹ 8 more variables: data_element_description <chr>,
#> #   data_element_valuetype <chr>, data_element_formname <chr>,
#> #   used_optionset_uid <chr>, option_code <chr>, option_name <chr>,
#> #   option_uid <chr>, option_order_in_set <dbl>

# Now we can use matchmaker to filter the data
dat_clean <- matchmaker::match_df(dat, dict, 
  from  = "option_code",
  to    = "option_name",
  by    = "data_element_shortname",
  order = "option_order_in_set"
)
print(dat_clean)
#> # A tibble: 20 × 45
#>    case_number date_of_consultation_admiss…¹ patient_origin age_years age_months
#>    <chr>       <date>                        <chr>              <int>      <int>
#>  1 A1          2018-03-23                    Village D             18         NA
#>  2 A2          2018-01-29                    Village B             27         NA
#>  3 A3          2018-01-23                    Village D             40         NA
#>  4 A4          2018-02-02                    Village D             15         NA
#>  5 A5          2018-01-23                    Village B             28         NA
#>  6 A6          2018-01-15                    Village A             40         NA
#>  7 A7          2018-01-04                    Village B             34         NA
#>  8 A8          2018-04-25                    Village D             29         NA
#>  9 A9          2018-04-18                    Village D             45         NA
#> 10 A10         2018-01-22                    Village C             10         NA
#> 11 A11         2018-02-06                    Village A             19         NA
#> 12 A12         2018-03-03                    Village D             61         NA
#> 13 A13         2018-01-08                    Village B             20         NA
#> 14 A14         2018-03-08                    Village C             73         NA
#> 15 A15         2018-03-08                    Village C             66         NA
#> 16 A16         2018-03-18                    Village A             52         NA
#> 17 A17         2018-04-03                    Village A             59         NA
#> 18 A18         2018-01-12                    Village B             24         NA
#> 19 A19         2018-01-19                    Village D             53         NA
#> 20 A20         2018-03-07                    Village A              7         NA
#> # ℹ abbreviated name: ¹​date_of_consultation_admission
#> # ℹ 40 more variables: age_days <int>, sex <fct>, pregnant <fct>,
#> #   trimester <fct>, foetus_alive_at_admission <fct>, exit_status <fct>,
#> #   date_of_exit <date>, time_to_death <fct>, pregnancy_outcome_at_exit <fct>,
#> #   previously_vaccinated <fct>, previous_vaccine_doses_received <fct>,
#> #   readmission <fct>, msf_involvement <fct>,
#> #   cholera_treatment_facility_type <fct>, residential_status_brief <fct>, …
Metadata

Version

0.3.0

License

Unknown

Platforms (78)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows