MyNixOS website logo
Description

Generator of Synthetic Patient Data for the OMOP Common Data Model.

Tools to generate synthetic patient-level test datasets in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). Includes a chat-driven generator backed by large language models and an interactive 'shiny' designer for editing CDM test sets.

PatientGenerator

Lifecycle:experimental R-CMD-check Codecov testcoverage

PatientGenerator facilitates the creation of synthetic test datasets for the OMOP Common Data Model (CDM) using two complementary approaches:

  • patientChat: Generates structured patient JSON files using Large Language Models (LLMs).
  • patientDesigner: Provides a D3-based Shiny interface for reviewing and editing CDM test sets.

The package also includes support for Hecate-powered concept lookups to ensure valid OMOP concept codes.

Installation

# install.packages("remotes")
remotes::install_github("mi-erasmusmc/PatientGenerator")

Workflow Overview

  1. Generate an initial synthetic cohort using patientChat.
  2. Save JSON test sets to the local filesystem.
  3. Refine patients using patientDesigner().
    • Utilize built-in concept search (powered by hecateSearch) during table editing.

Synthetic Patient Generation with patientChat

Set an OPENAI_API_KEY environment variable (e.g., via usethis::edit_r_environ()) to enable LLM access.

Available models can be listed using PatientGenerator::availableModels().

library(PatientGenerator)

patientGenerator <- patientChat$new(
  model = "gpt-5.4",
  echo = "none"
)

Generating Patients via Natural Language Prompts

Provide detailed prompts, including specific concept sets, for optimal results.

patientGenerator$prompt(
  "Population (person table):
     - 10 adult patients
     - 5 female
     - 5 male
  
   Observation Period:
     - Start date between date of birth and 2025-12-31
  
   Condition Occurrence:
     - All patients must have Diabetes (condition_concept_id: 201826)
     - Start date between 2015-01-01 and 2020-12-31
  
   Drug Exposure:
     - All patients must have Semaglutide (drug_concept_id: 19079450)
     - Exposure within 30 days post-index date
  
   Measurement:
     - All patients must have Fasting glucose (measurement_concept_id: 3018251)
  
   Procedure Occurrence:
     - 50% of patients must have Amputation of toe (procedure_concept_id: 4159766)
  
   Output Requirements:
     - Populate only the tables specified in this prompt"
)

Integration with testthat

Save the generated dataset as a JSON file and utilize TestGenerator::patientsCDM to instantiate a CDM reference.

patientGenerator$save(name = "diabetes-patients")

cdm <- TestGenerator::patientsCDM(
  testName = "diabetes-patients",
  cdmVersion = "5.4"
)

cdm$person |> 
  collect() |> 
  print()
#> cdm$person |> collect() |> head(5)
#>    person_id gender_concept_id year_of_birth person_source_value
#>        <int>             <int>         <int>              <char>
#> 1:         1              8532          1965              SYN001
#> 2:         2              8532          1972              SYN002
#> 3:         3              8532          1958              SYN003
#> 4:         4              8532          1981              SYN004
#> 5:         5              8532          1949              SYN005

Iterative Refinement

The LLM can be instructed to modify the current test set within the same patientChat instance.

patientGenerator$prompt("Remove all male patients")
#> cdm$person |> collect() |> head(5)
#>    person_id gender_concept_id year_of_birth person_source_value
#>        <int>             <int>         <int>              <char>
#> 1:         1              8532          1965              SYN001
#> 2:         2              8532          1972              SYN002
#> 3:         3              8532          1958              SYN003
#> 4:         4              8532          1981              SYN004
#> 5:         5              8532          1949              SYN005

Visual Review and Editing with patientDesigner()

Launch the interactive editor to review and refine datasets:

PatientGenerator::patientDesigner()

The interface supports:

  • Loading existing JSON test sets.
  • Interactive CRUD operations (Create, Read, Update, Delete) on CDM tables.
  • Visual timeline inspection and table previews.
  • Exporting updated test sets to JSON.

Concept Search with Hecate

patientDesigner integrates a concept search module powered by hecateSearch(). This allows users to search for and insert valid OMOP concept IDs directly into the CDM tables.

Configure Hecate globally via environment variables:

Sys.setenv(
  HECATE_BASE_URL = "https://your-hecate-server/api",
  HECATE_API_KEY = "your-api-key"
)

Or via package options:

options(PatientGenerator.hecate = list(
  base_url = "https://your-hecate-server/api",
  timeout_ms = 15000,
  api_key = "your-api-key"
))

Further Documentation

  • Vignette: vignette("shiny-integration", package = "PatientGenerator")
  • Reference: Detailed API documentation and benchmarks are available on the GitHub Pages site.
Metadata

Version

0.1.4

License

Unknown

Platforms (80)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arc-linux
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • sh4-linux
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows