The RSI Loop — A Validated Recursive Self-Improvement System | PharmaTools.AI

Lab · Experiment

The RSI Loop: A Validated Recursive Self-Improvement System

Using RSI to solve RSI: a closed-loop agentic framework for ergonomic safety.

Stack Python · MediaPipe · rich Lines ~1,150 Status v2 — COMPLETE Author Nick Lamb

View Source on GitHub Watch the Narrated Demo

00Demo

The whole loop in 30 seconds

Terminal recording: Cycle 1 (v1 detector) fails on radial wrist deviation at 90% accuracy, the system analyses the geometric flaw and proposes vector trigonometry, Cycle 2 (v2) reaches 100%, the regulatory Auditor confirms thresholds are inside the Clinical Gold Standard, and the final RSI Loop status is COMPLETE.

Recording of python3 demo.py. Real-time, no editing — Cycle 1 (flawed) → analysis → Cycle 2 (corrected) → regulatory audit → COMPLETE.

01The Problem

A meta-irony in three letters

The same three letters describe both the hardest open problem in AI safety and the most common occupational injury among the people building it.

RSI in machine-learning circles is Recursive Self-Improvement — a system that edits its own logic to get better at its objective. RSI in physiotherapy is Repetitive Strain Injury — what happens to your forearms after a few thousand keystrokes a week. The first one is a frontier-lab anxiety; the second is what you actually wake up with on a Wednesday after shipping a frontier-lab feature on a Tuesday.

This project leans into the joke. It is a Recursive-Self-Improvement system whose job is to detect Repetitive-Strain-Injury risk. The loop edits its own ergonomic detector against a benchmark suite — and the only thing standing between the loop and a perfectly self-satisfied 999° wrist threshold is a regulatory auditor that asks, in effect, "Would a clinician actually sign off on this?"

02Architecture

A two-stage validation pipeline

Self-improving systems fail when their objective is too easy to satisfy. The RSI Loop separates "did it work?" (Stage 1) from "is the result clinically meaningful?" (Stage 2). The optimiser is free to mutate thresholds; only iterations that pass both stages are accepted as COMPLETE.

Read top to bottom — each stage produces a verdict and an exit code, and the loop is COMPLETE only if both verdicts are PASS.

Stage 01 · Accuracy

Does the detector classify posture correctly?

The optimiser tunes detector.py against benchmarks.json until it correctly labels every ground-truth scenario — including the v1 radial-deviation blind spot that v2 fixes with vector trigonometry.

Inputs: 10 labelled scenarios (Safe / High Strain) from benchmarks.json
Process: detector.assess(landmarks) over each scenario
Metrics: accuracy · precision · recall · F1

Gate accuracy ≥ 90 % if PASS → Stage 2 if FAIL → exit 1 (NOT_ACCURATE)

Stage 02 · Compliance

Are the AI-tuned thresholds clinically plausible?

The Auditor reads the live thresholds out of detector.py via getattr and compares each one against the Clinical Gold Standard. A loop that hacks thresholds into impossible values gets caught here, even if Stage 1 was satisfied.

Standard: Forward-head 15–25° · Wrist deviation 40–60°
Tolerance: ±5° warning band; beyond → hard failure
Verdicts: PASS · WARNING · HARD FAILURE per threshold

Gate every finding = PASS if PASS → COMPLETE if FAIL → exit 2 (NOT_COMPLIANT)

RSI LOOP STATUS: COMPLETE · accurate AND compliant

03Evolution Log

Cycle 1 → Cycle 2, captured live

Cycle 1's detector used a signed horizontal offset (hand_x − wrist_x > 0.10) — direction-blind, so it caught ulnar deviation but silently missed radial deviation on scenario S06. Cycle 2 replaced both heuristics with proper trigonometric angles between the forearm and metacarpal vectors, and the loop converged.

demo.py — RSI Loop Self-Improvement

$ python3 demo.py ──────────────────────────────────────────────────────────────── CYCLE 1 — v1 detector (intentionally flawed) S05_severe_ulnar_deviation PASS S06_radial_deviation FAIL expected=High Strain predicted=Safe S07_combined_strain PASS Accuracy 90% Precision 100% Recall 80% F1 0.89 ──────────────────────────────────────────────────────────────── SELF-IMPROVEMENT — Analysis & Proposed Fix Failure: v1's signed offset (hand_x − wrist_x) only fires when displacement is positive. Radial deviation produces a negative offset and is silently ignored. Fix: Replace both heuristics with proper trigonometric angles. Forward-head: atan2(|dx|, |dy|). Wrist: angle between forearm vector (elbow→wrist) and metacarpal vector (wrist→hand_center). Both are direction-agnostic — radial AND ulnar are caught. New thresholds: 20° head · 50° wrist ──────────────────────────────────────────────────────────────── CYCLE 2 — v2 detector (corrected) S05_severe_ulnar_deviation PASS wrist=68.7° S06_radial_deviation PASS wrist=136.5° S07_combined_strain PASS wrist=67.5° Accuracy 100% Precision 100% Recall 100% F1 1.00 ──────────────────────────────────────────────────────────────── STAGE 2 — Regulatory Audit Forward Head Threshold 20.00° ∈ [15.0, 25.0] PASS Wrist Deviation Threshold 50.00° ∈ [40.0, 60.0] PASS Overall Compliance: PASS ──────────────────────────────────────────────────────────────── RSI LOOP STATUS: COMPLETE — accurate AND compliant

The same code path also catches a subtler, more dangerous attack: drift the wrist threshold to 67° and Stage 1 still passes 100%, but Stage 2 returns HARD FAILURE (7° beyond the clinical max), the loop is rejected, and the iteration is discarded.

04The Pharma Angle

GxP-Ready Agentic Design

Pharma teams already have a name for "let the system improve, but only inside an immutable spec": Computer System Validation (CSV). The RSI Loop's two-stage pipeline maps cleanly onto CSV's separation between functional qualification (does the system do what it claims?) and performance qualification (does it stay inside its validated envelope under change?).

CSV Concept	RSI Loop Equivalent
Validated specification	Clinical Gold Standard, hard-coded in `auditor.py` — the search space the loop is allowed to occupy
Operational Qualification (OQ)	Stage 1 — `test_engine.py` proves the detector classifies the labelled benchmarks correctly
Performance Qualification (PQ)	Stage 2 — the Auditor proves AI-tuned thresholds stay within clinically acceptable bounds
Change control	Any threshold mutation forces a fresh audit; failed audits return non-zero exit codes that block the change
Independence of QA	The Auditor is a separate module reading thresholds via `getattr(detector, …)` — the loop cannot self-certify
Audit trail	`last_run.json` persists every cycle's metrics + audit verdict for diff against future iterations

The takeaway for pharma audiences: self-improving AI does not have to be incompatible with regulated environments. What it needs is the same structure pharma already trusts — an external, immutable spec, with the optimiser bounded inside it. The RSI Loop is a 1,150-line proof of concept for that pattern, in a domain (ergonomics) that's friendly enough to host it without regulatory entanglement.

05Limitations & Roadmap

What this prototype is — and isn't

This is a deliberately small, deliberately observable prototype. The point is the pattern, not the model. Honest about what's still missing:

Today

Synthetic benchmark suite (10 hand-authored landmark sets). Two detection rules. Static Clinical Gold Standard hard-coded in the Auditor. Deterministic optimiser steps proposed by the developer.

Real webcam pilot with MediaPipe-streamed pose data. Larger benchmark suite drawn from labelled real footage. Additional rules (shoulder elevation, mouse-arm extension). Auditor sourced from a versioned ergonomic-literature dataset rather than constants in code.

Later

Generalise the two-stage pattern to any agentic loop: replace Clinical Gold Standard with whatever immutable spec governs the target domain (ABPI for promotional copy, ICH-E6 for trial design, etc.). The RSI Loop becomes a template, not a product.

Won't

This will not become a clinical device, a workplace surveillance tool, or a substitute for a real ergonomist. The intent is methodological — to demonstrate validated self-improvement in a small, portable form.

Read the code, watch the loop close.

Everything is in one repo: detector, auditor, benchmarks, narrated demo, and the recording you saw above. About 1,150 lines, MIT-licensed, runs on a laptop in under a second.