The RSI Loop — A Validated Recursive Self-Improvement System | PharmaTools.AI
Lab · Experiment

The RSI Loop: A Validated Recursive Self-Improvement System

Using RSI to solve RSI: a closed-loop agentic framework for ergonomic safety.

Stack Python · MediaPipe · rich Lines ~1,150 Status v2 — COMPLETE Author Nick Lamb

The whole loop in 30 seconds

Terminal recording: Cycle 1 (v1 detector) fails on radial wrist deviation at 90% accuracy, the system analyses the geometric flaw and proposes vector trigonometry, Cycle 2 (v2) reaches 100%, the regulatory Auditor confirms thresholds are inside the Clinical Gold Standard, and the final RSI Loop status is COMPLETE.

Recording of python3 demo.py. Real-time, no editing — Cycle 1 (flawed) → analysis → Cycle 2 (corrected) → regulatory audit → COMPLETE.

A meta-irony in three letters

The same three letters describe both the hardest open problem in AI safety and the most common occupational injury among the people building it.

RSI in machine-learning circles is Recursive Self-Improvement — a system that edits its own logic to get better at its objective. RSI in physiotherapy is Repetitive Strain Injury — what happens to your forearms after a few thousand keystrokes a week. The first one is a frontier-lab anxiety; the second is what you actually wake up with on a Wednesday after shipping a frontier-lab feature on a Tuesday.

This project leans into the joke. It is a Recursive-Self-Improvement system whose job is to detect Repetitive-Strain-Injury risk. The loop edits its own ergonomic detector against a benchmark suite — and the only thing standing between the loop and a perfectly self-satisfied 999° wrist threshold is a regulatory auditor that asks, in effect, "Would a clinician actually sign off on this?"

A two-stage validation pipeline

Self-improving systems fail when their objective is too easy to satisfy. The RSI Loop separates "did it work?" (Stage 1) from "is the result clinically meaningful?" (Stage 2). The optimiser is free to mutate thresholds; only iterations that pass both stages are accepted as COMPLETE.

Read top to bottom — each stage produces a verdict and an exit code, and the loop is COMPLETE only if both verdicts are PASS.

Stage 01 · Accuracy

Does the detector classify posture correctly?

The optimiser tunes detector.py against benchmarks.json until it correctly labels every ground-truth scenario — including the v1 radial-deviation blind spot that v2 fixes with vector trigonometry.

Inputs
10 labelled scenarios (Safe / High Strain) from benchmarks.json
Process
detector.assess(landmarks) over each scenario
Metrics
accuracy · precision · recall · F1
Gate accuracy ≥ 90 % if PASS → Stage 2 if FAIL → exit 1 (NOT_ACCURATE)
Stage 02 · Compliance

Are the AI-tuned thresholds clinically plausible?

The Auditor reads the live thresholds out of detector.py via getattr and compares each one against the Clinical Gold Standard. A loop that hacks thresholds into impossible values gets caught here, even if Stage 1 was satisfied.

Standard
Forward-head 15–25° · Wrist deviation 40–60°
Tolerance
±5° warning band; beyond → hard failure
Verdicts
PASS · WARNING · HARD FAILURE per threshold
Gate every finding = PASS if PASS → COMPLETE if FAIL → exit 2 (NOT_COMPLIANT)
RSI LOOP STATUS: COMPLETE  ·  accurate AND compliant

Cycle 1 → Cycle 2, captured live

Cycle 1's detector used a signed horizontal offset (hand_x − wrist_x > 0.10) — direction-blind, so it caught ulnar deviation but silently missed radial deviation on scenario S06. Cycle 2 replaced both heuristics with proper trigonometric angles between the forearm and metacarpal vectors, and the loop converged.

The same code path also catches a subtler, more dangerous attack: drift the wrist threshold to 67° and Stage 1 still passes 100%, but Stage 2 returns HARD FAILURE (7° beyond the clinical max), the loop is rejected, and the iteration is discarded.

GxP-Ready Agentic Design

Pharma teams already have a name for "let the system improve, but only inside an immutable spec": Computer System Validation (CSV). The RSI Loop's two-stage pipeline maps cleanly onto CSV's separation between functional qualification (does the system do what it claims?) and performance qualification (does it stay inside its validated envelope under change?).

CSV Concept RSI Loop Equivalent
Validated specification Clinical Gold Standard, hard-coded in auditor.py — the search space the loop is allowed to occupy
Operational Qualification (OQ) Stage 1 — test_engine.py proves the detector classifies the labelled benchmarks correctly
Performance Qualification (PQ) Stage 2 — the Auditor proves AI-tuned thresholds stay within clinically acceptable bounds
Change control Any threshold mutation forces a fresh audit; failed audits return non-zero exit codes that block the change
Independence of QA The Auditor is a separate module reading thresholds via getattr(detector, …) — the loop cannot self-certify
Audit trail last_run.json persists every cycle's metrics + audit verdict for diff against future iterations

The takeaway for pharma audiences: self-improving AI does not have to be incompatible with regulated environments. What it needs is the same structure pharma already trusts — an external, immutable spec, with the optimiser bounded inside it. The RSI Loop is a 1,150-line proof of concept for that pattern, in a domain (ergonomics) that's friendly enough to host it without regulatory entanglement.

What this prototype is — and isn't

This is a deliberately small, deliberately observable prototype. The point is the pattern, not the model. Honest about what's still missing:

Today

Synthetic benchmark suite (10 hand-authored landmark sets). Two detection rules. Static Clinical Gold Standard hard-coded in the Auditor. Deterministic optimiser steps proposed by the developer.

Next

Real webcam pilot with MediaPipe-streamed pose data. Larger benchmark suite drawn from labelled real footage. Additional rules (shoulder elevation, mouse-arm extension). Auditor sourced from a versioned ergonomic-literature dataset rather than constants in code.

Later

Generalise the two-stage pattern to any agentic loop: replace Clinical Gold Standard with whatever immutable spec governs the target domain (ABPI for promotional copy, ICH-E6 for trial design, etc.). The RSI Loop becomes a template, not a product.

Won't

This will not become a clinical device, a workplace surveillance tool, or a substitute for a real ergonomist. The intent is methodological — to demonstrate validated self-improvement in a small, portable form.

Read the code, watch the loop close.

Everything is in one repo: detector, auditor, benchmarks, narrated demo, and the recording you saw above. About 1,150 lines, MIT-licensed, runs on a laptop in under a second.