Translation, not Interpretation – The thesis, and the code that tests it | PharmaTools.AI
Approach · thesis & evidence

Translation, not interpretation

The case, argued in a peer-reviewed paper and tested in shipped code: language models in healthcare should be constrained to translation – restating what a source already says across clinical, regulatory and patient-facing registers – not open-ended interpretation. Narrower surface, clearer accountability, lower harm ceiling.

Most people have a position or a repository. This is the same claim made twice – once in the literature, once in code that measures whether the constraint actually holds. The paper argues the design principle; the tools apply it; the evals check it adversarially and report where it fails.

The argument and the evidence are the same work.

Four principles · and the code that tests each

Retrieve evidence rather than invent it PubCrawl gives models direct retrieval over PubMed, ClinicalTrials.gov and drug labels; LitRAG grounds answers in real abstracts – parametric memory is never the source of record.
Expose uncertainty rather than conceal it RefCheckr verifies each claim against verbatim source passages and downgrades any citation it can't locate; its faithfulness evals flag hallucinated or unsupported claims instead of smoothing them over.
Constrain capability where consequences are high Patiently AI is gated to stay in translation – no diagnosis, no new clinical facts; Redacta pseudonymises before a model sees the text; RSI-Loop lets a detector improve itself but blocks any mutation an independent compliance auditor rejects.
Help humans audit reasoning, not replace judgement Outputs carry clause- and passage-level provenance a reviewer can check; the Redacta Gauntlet and RefCheckr evals ship SHA-stamped scorecards with gaps named, not hidden.

The tools are evaluated the way the thesis says they should behave. RefCheckr and the Redacta Gauntlet don't just claim the constraints hold – they attack them adversarially and measure where they don't, then gate every change against a baseline so reliability can't quietly slip. Capability control isn't a slogan here; it's a number you can re-run.