Stage 2 of the Daktus CDSS Pipeline. Eval and correction harness for LLM-generated clinical protocols. Combines Pydantic schemas, AST-based logic validation, LLM-as-judge audits, multi-model routing, closed-loop feedback learning, and cost telemetry to make probabilistic outputs safe in a clinical-grade pipeline.
Status: v3.2.0 · Waves 1–4.3 complete · feedback loop active
This repository is Stage 2 of a three-stage Clinical Decision Support pipeline:
[agente-daktus-content] → [agente-daktus-qa] → [daktus-conduct-engine]
Stage 1: Production Stage 2: Validation Stage 3: Decision Engine
briefing → JSON protocol validates, corrects, consumes encounter bundle
versions, learns → structured anamnesis
+ management + evidence
- Stage 1 → agente-daktus-content
- Stage 3 → daktus-conduct-engine
LLM outputs in clinical workflows are probabilistic — a content pipeline that produces protocol JSON cannot, on its own, guarantee:
- Structural validity (JSON conforms to the Daktus schema)
- Logical consistency (conditionals are sound, no orphan branches, no contradictions)
- Evidence integrity (every claim is traceable to a reference in the playbook)
- Reproducibility across model versions and providers
This repository is the validation, correction, and learning layer that closes the loop.
Input: Clinical protocol JSON (from Stage 1) + Markdown/PDF playbook Output: Validation report + prioritized suggestions + corrected, versioned protocol + learning entries
- Pydantic schema validation at reconstruction time — zero invalid protocols saved
- AST-based logic validation for conditional expressions (no fragile regex on structured logic)
- LLM contract validation — detects model drift across provider/version changes
- Hard rules engine — automatic blocking of invalid suggestions before they reach the model
- Reference validator — cross-checks every cited evidence against the playbook
- Change verifier — post-reconstruction validation of what was actually modified
- Feedback learner — rejection patterns and validation failures persisted to
memory_qa.md, feeding future runs
- Real-time token counter per session
- Audit reports (
_AUDIT.txt) for compliance traceability - Cost telemetry — real vs. estimated cost per session
- Spider-aware reconstruction — the LLM operates with explicit awareness of the Daktus protocol structure
- Alert rules module with templates and few-shot learning
- Suggestion validator — filters anti-patterns and duplicates
- Conditional sanitization for invalid logical expressions emitted by the LLM
- Robust JSON parser — handles multi-line strings, truncated JSON, retry with exponential backoff
- Dual learning loop — implementation failures and validation errors both feed
memory_qa.md
LLM-as-judge with deterministic validation underneath. Semantic and clinical judgment goes to the model. Structural integrity, schema conformance, and logic checks are scripted and non-negotiable. Probabilistic judgment, deterministic gates.
Multi-model routing for cost/quality control. Different models for different jobs, swappable by config:
| Model | Use case |
|---|---|
google/gemini-2.5-flash |
Fast first-pass analysis, low cost |
anthropic/claude-sonnet-4.5 |
High-quality reasoning |
x-ai/grok-4.1-fast |
Large-playbook ingestion (2M context) |
anthropic/claude-opus-4.5 |
Maximum quality for final passes |
AST analysis over regex. Conditional expressions in clinical protocols are program-like. Regex on programs is brittle. The validator parses expressions into ASTs and reasons over structure — robust to whitespace, formatting, and minor LLM drift.
Closed-loop learning persisted to readable memory. memory_qa.md is a human-readable file, not a hidden database. Every rejection or implementation failure becomes a future few-shot example. State is auditable.
Cost as a first-class metric. Every run reports token usage and dollar cost. LLM pipelines without cost telemetry are not production systems.
# 1. Install dependencies
pip install -r requirements.txt
# 2. Configure OpenRouter key
echo "OPENROUTER_API_KEY=sk-or-v1-your-key" > .env
# 3. Run
python run_agent.pyGet an OpenRouter key: https://openrouter.ai/keys
/
├── run_agent.py # Entry point
├── src/agent/
│ ├── analysis/ # Expanded analysis
│ ├── applicator/ # Protocol reconstruction
│ ├── feedback/ # Learning loop
│ ├── cost_control/ # Token and cost tracking
│ ├── cli/ # Interactive CLI
│ └── core/ # LLM client, logger, loaders
├── models_json/ # JSON protocols and playbooks
├── reports/ # Generated reports
├── memory_qa.md # Learning memory
└── docs/ # Documentation, dev history, roadmap
| Metric | Value |
|---|---|
| Suggestions per analysis | 20–50 |
| Typical latency | 30–90s |
| Cost per analysis | $0.00–$0.50 |
| Success rate | >95% |
| Playbook verifiability | >95% |
Python · Pydantic · AST module (stdlib) · OpenRouter (multi-provider routing: Anthropic, Google, xAI) · Structured outputs (JSON) · memory_qa.md (human-readable learning store)
Built and maintained by Daniel Martins at Daktus Health Tech.
Background: engineering (EFOMM — systems modeling, automation) + medicine (UFJF, final-year). I build AI systems for high-stakes domains where the cost of error is real. Probabilistic where probabilistic adds value; deterministic where it doesn't.