Description

Traceability Engine for Clinical Submission Readiness.

Description

Quantifies and explains end-to-end traceability between clinical submission artifacts (ADaM (Analysis Data Model) outputs, derivations, SDTM (Study Data Tabulation Model) sources, specs, code). Builds trace models from metadata and mapping sheets, computes trace levels, and emits standardized R4SUB (R for Regulatory Submission) evidence table rows via 'r4subcore'.

README.md

cran.r-project.org

r4subtrace

r4subtrace is the traceability engine in the R4SUB ecosystem. It quantifies and explains end-to-end traceability between clinical submission artifacts -- primarily ADaM outputs <-> derivations <-> SDTM sources <-> specs <-> code -- and converts trace evidence into standardized R4SUB Evidence Table rows (from r4subcore).

It focuses on answering one question:

Can we prove where each analysis variable/value came from, and can a reviewer follow it?

Why r4subtrace?

In real submissions, issues are rarely "a single failed rule." Many are trace failures:

Missing or ambiguous derivation documentation
ADaM variable not linkable to SDTM sources
Mismatch between spec and what code produces
Inconsistent naming across specs, define.xml, and datasets
Reviewer cannot reproduce or validate lineage

r4subtrace formalizes traceability as evidence + measurable indicators.

What r4subtrace measures

Traceability levels

L0 -- None: no linkage available
L1 -- Spec-only: ADaM spec defines derivation but no code mapping
L2 -- Spec + source mapping: ADaM var mapped to SDTM vars/domains
L3 -- Spec + code mapping: mapping exists with high confidence or derivation text

Installation

pak::pak(c("R4SUB/r4subcore", "R4SUB/r4subtrace"))

Quick start

1) Create run context

library(r4subcore)
library(r4subtrace)

ctx <- r4sub_run_context(study_id = "ABC123", environment = "DEV")

2) Load metadata

adam_meta <- read.csv("adam_metadata.csv")  # columns: dataset, variable, label, type
sdtm_meta <- read.csv("sdtm_metadata.csv")  # same structure

map <- read.csv("trace_map.csv")
# recommended columns:
# adam_dataset, adam_var, sdtm_domain, sdtm_var, derivation_text(optional), confidence(optional)

3) Build trace model and evidence

tm <- build_trace_model(
  adam_meta = adam_meta,
  sdtm_meta = sdtm_meta,
  mapping   = map
)

ev <- trace_model_to_evidence(tm, ctx = ctx, source_name = "r4subtrace", source_version = "0.1.0")

validate_evidence(ev)
evidence_summary(ev)

4) Compute trace coverage score

ind <- trace_indicator_scores(ev)
ind

Core objects

Trace Model

A list with:

nodes: tidy table of assets (dataset/variable/spec/program)
edges: tidy table of relationships + confidence
diagnostics: issues found (orphans, ambiguities, conflicts)

Trace Evidence

Evidence rows are emitted for:

each ADaM variable trace level
each orphan/ambiguity/conflict
aggregate coverage metrics

Indicators

TRACE_VAR_COVERAGE_L2PLUS: proportion of ADaM variables with L2+ trace
TRACE_VAR_COVERAGE_L3PLUS: proportion with L3+ trace
TRACE_ORPHAN_VAR_COUNT: orphan ADaM vars with no SDTM mapping
TRACE_AMBIGUOUS_MAPPING_COUNT: vars mapped to multiple SDTM sources
TRACE_MEAN_TRACE_LEVEL: mean trace level across all ADaM variables

Design principles

Graph-first: traceability is a graph problem
Evidence-first: all conclusions are backed by explicit evidence rows
Tool-agnostic: can ingest mapping from any source format
Reviewer-centric: emphasize explainability, not just metrics

License

MIT.

r-r4subtrace

r4subtrace

Why r4subtrace?

What r4subtrace measures

Traceability levels

Installation

Quick start

1) Create run context

2) Load metadata

3) Build trace model and evidence

4) Compute trace coverage score

Core objects

Trace Model

Trace Evidence

Indicators

Design principles

License

Version

License

Status

Source

Homepage

Platforms (80)

r4subtrace

Why r4subtrace?

What r4subtrace measures

Traceability levels

Installation

Quick start

1) Create run context

2) Load metadata

3) Build trace model and evidence

4) Compute trace coverage score

Core objects

Trace Model

Trace Evidence

Indicators

Design principles

License

Version

License

Status

Source

Homepage

Platforms80 (80)

Platforms (80)