LitRAG — Grounded RAG with a citation-faithfulness eval | PharmaTools.AI
Open Source · Reference Implementation

Grounded RAG that checks its own citations

A small, readable retrieval-augmented generation pipeline over the medical literature — with a citation-faithfulness eval built in. It doesn't just retrieve and answer; it verifies that every claim is actually supported by its cited source, and flags hallucinated or unsupported ones. That groundedness layer — not the pipeline — is the point.

Runs locally — no vector-DB key Verbatim citation per claim Catches fabricated quotes for free

MIT licensed · Python · built on LangChain, sentence-transformers & FAISS

01Why it exists

Most RAG demos stop at “it answered”

When a model answers from sources, it can produce a plausible-but-unfaithful citation: a quote that was never in the source, or a real quote that doesn't actually support the claim it's attached to.

Standard retrieval metrics — recall@k, MRR — cannot catch this. They score whether the right document was retrieved, not whether the generated claim is faithful to it. In a high-stakes domain like medicine, that gap is exactly where harm hides.

LitRAG closes it with a two-stage check: a cheap, deterministic quote test in front of an LLM judge that grades support using only the passage — no outside knowledge allowed. A citation counts as faithful only if the quote is located and the passage supports the claim.

Support levels

supportsthe passage directly establishes the claim
partialconsistent with, but doesn't fully establish, the claim
contradictsthe passage states the opposite
not_foundthe passage is about something else entirely

02Pipeline

Retrieve, answer, then verify

Question
A natural-language question about the corpus — some answerable, some deliberately not.
Local retrieval sentence-transformers + FAISS
Top-k passages embedded and retrieved fully on-device. No managed vector-DB key required.
RAG chain LangChain + Claude
Generates a structured answer: atomic claims, each with a verbatim cited quote and its source PMID.
Stage 1 · Quote locator deterministic — no model call
Is the cited quote actually in the source? Normalised exact match, then fuzzy partial_ratio. A fabricated quote is caught here, for free.
quote absent → hallucinated_quote
Stage 2 · LLM judge
Only if the quote is located: does the passage support the claim? Forced-tool-use verdict, graded from the passage alone.
Verdict
Per-claim labels and an overall groundedness flag — an honest “the sources don't say” counts as grounded; an invented claim does not.
located + supports → grounded ✓

A fabricated quote should be caught by string matching — not by an LLM that might hallucinate agreement with a hallucinated quote.

03Layout

Five small files, readable in ten minutes

ingest.py
Load the abstract corpus, chunk into passages with {pmid, title, source} metadata.
index.py
Build / load the FAISS index from Hugging Face sentence-transformers embeddings.
rag.py
LangChain retrieval + generation chain; returns an answer with cited passages.
faithfulness.py
The groundedness layer — locate the quote, then LLM-judge the support level.
demo.py
End-to-end run: ingest → index → ask → answer → grade, on example questions.
data/
15 real PubMed abstracts on semaglutide, so retrieval runs key-free out of the box.

04Built with

Local where it can be, an LLM only where it must

LangChain sentence-transformers FAISS Claude rapidfuzz Python

Embedding and retrieval run fully local; only answer generation and the faithfulness judge call an LLM. The corpus is pulled via PubCrawl, and the same citation-faithfulness pattern powers RefCheckr — LitRAG is the open, readable reference implementation of the idea. The README documents the honest LangChain-vs-LlamaIndex trade-offs from the actual build.

Read the code

A faithful RAG reference you can read end-to-end — and a groundedness eval you can lift into your own pipeline.