MyNixOS website logo
Description

Core Data Contracts, Parsers, and Scoring Primitives for Clinical Submission Readiness.

Foundational package in the R4SUB (R for Regulatory Submission) ecosystem. Defines the core evidence table schema, parsers, indicator abstractions, and scoring primitives needed to quantify clinical submission readiness. Provides a standardized contract for ingesting heterogeneous sources (validation outputs, metadata, traceability) into a single evidence framework.

r4subcore

r4subcore is the foundational package in the R4SUB ecosystem. It defines the core data contracts, parsers, evidence schema, and scoring primitives needed to quantify clinical submission readiness.

It is intentionally "boring and stable": other R4SUB packages (e.g., r4subtrace, r4subrisk, r4subscore) build on these structures and interfaces.


Why r4subcore?

Clinical submission readiness is rarely a single tool output. It's an evidence graph across:

  • SDTM/ADaM datasets and metadata
  • Define.xml / ARM / reviewer guides
  • Validation results (Pinnacle21, OpenCDISC, internal rule engines)
  • Traceability / derivations (ADaM spec <-> code <-> outputs)
  • Usability / reviewer experience signals

r4subcore provides:

  1. A standardized Evidence Table schema
  2. Common parsers to ingest heterogeneous sources
  3. A consistent indicator / signal abstraction
  4. Scoring primitives (normalize, weight, calibrate, aggregate)
  5. A reproducible run context (run_id, dataset_id, study_id, tool version)

Package scope

In scope

  • Evidence schema + validation
  • Parsers for common sources (initial focus on P21-style outputs + define.xml scaffolding)
  • Common utilities: ID generation, severity mapping, controlled terminology mapping, standard columns
  • Indicator interfaces (how other packages implement signals)
  • Transparent scoring components (no hidden "magic")

Out of scope

  • Full SCI (Submission Confidence Index) calculation (belongs in r4subscore)
  • End-to-end dashboards / Shiny apps (belongs in r4subui)
  • Full traceability logic (belongs in r4subtrace)
  • Domain-specific oncology rules (belongs in extension packages)

Installation

Development install

# install.packages("pak")
pak::pak("R4SUB/r4subcore")

Requirements

  • R >= 4.2
  • Suggested: arrow, xml2, dplyr, readr, jsonlite, cli

Core concepts

1) Evidence Table (the heart of R4SUB)

All inputs are normalized into a single tabular contract: an evidence dataset. This enables scoring, drilldown, traceability, and reporting.

Minimum columns (v0.1):

columntypemeaning
run_idchrunique ID for a run
study_idchrstudy identifier
asset_typechrdataset, define, program, validation, spec, etc.
asset_idchrunique ID of the asset (e.g., ADSL, define.xml)
source_namechrtool/source name (e.g., pinnacle21)
source_versionchrtool version
indicator_idchrthe signal definition identifier
indicator_namechrhuman name
indicator_domainchrquality, trace, risk, usability
severitychrinfo, low, medium, high, critical
resultchrpass, fail, warn, na
metric_valuedblnumeric value (if applicable)
metric_unitchrunit for metric
messagechrshort description
locationchrpointer (dataset/variable/rule line)
evidence_payloadjsonraw structured payload
created_atPOSIXctingestion timestamp

Guarantees:

  • Each row is a single unit of evidence
  • Evidence is immutable (append-only semantics recommended)
  • Score consumers can rely on consistent meaning

Use:

  • as_evidence() to coerce raw data
  • validate_evidence() to enforce contract
  • bind_evidence() to combine sources safely

2) Indicators (signals)

An indicator is a definition of what to measure, not necessarily how to calculate it.

Indicators have:

  • indicator_id (stable)
  • domain (quality/trace/risk/usability)
  • description
  • expected_inputs (evidence sources required)
  • default_thresholds
  • optional tags (e.g., define, adam, sdtm, spec)

r4subcore provides:

  • indicator registry helpers (local registry first, remote later)
  • validation to ensure indicator metadata is well-formed

Other packages implement the actual calculations and output evidence rows using these IDs.


3) Scoring primitives (transparent & composable)

r4subcore includes small, auditable functions for:

  • mapping severity -> numeric penalty
  • normalizing metrics to 0-1
  • applying weights
  • aggregating evidence into indicator scores

SCI itself is not in this package.


Quick start

Create a run context

library(r4subcore)

ctx <- r4sub_run_context(
  study_id = "ABC123",
  environment = "DEV",
  user = Sys.info()[["user"]]
)
ctx$run_id

Ingest validation results (example)

raw <- read.csv("p21_report.csv")

ev <- p21_to_evidence(
  raw,
  ctx = ctx,
  asset_type = "validation",
  source_version = "P21-3.0"
)

validate_evidence(ev)

Summarize evidence quickly

evidence_summary(ev)

Architecture

Main modules

  • R/evidence_schema.R -- schema + validators
  • R/run_context.R -- run metadata
  • R/parsers_p21.R -- Pinnacle21 ingestion (first parser)
  • R/indicators.R -- indicator metadata + registry
  • R/scoring_primitives.R -- severity mapping, normalization, aggregation
  • R/utils_ids.R -- ID helpers, hashing
  • R/utils_json.R -- JSON payload helpers

Extensibility

  • New parsers should output evidence via as_evidence()
  • New indicators should register IDs and domain metadata
  • Consumers should never depend on tool-specific raw formats

Design principles

  • Contract-first: normalize everything into evidence rows
  • Transparent scoring: no black-box weights; everything configurable
  • Tool-agnostic: support P21 now, but leave room for OpenCDISC, internal engines
  • Reproducible: run_id + source_version captured everywhere
  • Composable: small functions, no tight coupling

Roadmap

v0.1

  • Evidence schema + validator
  • Run context
  • P21-style parser (CSV/XLSX minimal)
  • Indicator metadata helpers
  • Severity -> numeric mappings
  • Minimal summarizers

v0.2

  • define.xml ingestion (structure-level metadata)
  • Arrow/parquet IO
  • Evidence "joins" (dataset <-> variable <-> rule)
  • Config profiles (e.g., FDA_ADaM_basic, EMA_SDTM_basic)

Contributing

  • Use devtools::check() before PR
  • Add tests for each parser and scoring function
  • Do not break evidence schema without a version bump + migration note

License

MIT -- see LICENSE file.

Metadata

Version

0.1.0

License

Unknown

Platforms (78)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows