Translation, not Interpretation

Most people have a position or a repository. This is the same claim made twice – once in the literature, once in code that measures whether the constraint actually holds. The paper argues the design principle; the tools apply it; the evals check it adversarially and report where it fails.

The argument and the evidence are the same work.

Four principles · and the code that tests each

Retrieve evidence rather than invent it	PubCrawl gives models direct retrieval over PubMed, ClinicalTrials.gov and drug labels; LitRAG grounds answers in real abstracts – parametric memory is never the source of record.
Expose uncertainty rather than conceal it	RefCheckr verifies each claim against verbatim source passages and downgrades any citation it can't locate; its faithfulness evals flag hallucinated or unsupported claims instead of smoothing them over.
Constrain capability where consequences are high	Patiently AI is gated to stay in translation – no diagnosis, no new clinical facts; Redacta pseudonymises before a model sees the text; RSI-Loop lets a detector improve itself but blocks any mutation an independent compliance auditor rejects.
Help humans audit reasoning, not replace judgement	Outputs carry clause- and passage-level provenance a reviewer can check; the Redacta Gauntlet and RefCheckr evals ship SHA-stamped scorecards with gaps named, not hidden.

The tools are evaluated the way the thesis says they should behave. RefCheckr and the Redacta Gauntlet don't just claim the constraints hold – they attack them adversarially and measure where they don't, then gate every change against a baseline so reliability can't quietly slip. Capability control isn't a slogan here; it's a number you can re-run.

📄 Paper · SN Compr Clin Med 2026 🔬 Validation study · medRxiv ⌨ The repositories The eval discipline →

Translation, not interpretation