agente-daktus-qa

Stage 2 of the Daktus CDSS Pipeline. Eval and correction harness for LLM-generated clinical protocols. Combines Pydantic schemas, AST-based logic validation, LLM-as-judge audits, multi-model routing, closed-loop feedback learning, and cost telemetry to make probabilistic outputs safe in a clinical-grade pipeline.

Status: v3.2.0 · Waves 1–4.3 complete · feedback loop active

The Pipeline

This repository is Stage 2 of a three-stage Clinical Decision Support pipeline:

[agente-daktus-content]  →  [agente-daktus-qa]  →  [daktus-conduct-engine]
   Stage 1: Production         Stage 2: Validation       Stage 3: Decision Engine
   briefing → JSON protocol    validates, corrects,      consumes encounter bundle
                               versions, learns          → structured anamnesis
                                                         + management + evidence

Stage 1 → agente-daktus-content
Stage 3 → daktus-conduct-engine

The Problem

LLM outputs in clinical workflows are probabilistic — a content pipeline that produces protocol JSON cannot, on its own, guarantee:

Structural validity (JSON conforms to the Daktus schema)
Logical consistency (conditionals are sound, no orphan branches, no contradictions)
Evidence integrity (every claim is traceable to a reference in the playbook)
Reproducibility across model versions and providers

This repository is the validation, correction, and learning layer that closes the loop.

Input: Clinical protocol JSON (from Stage 1) + Markdown/PDF playbook Output: Validation report + prioritized suggestions + corrected, versioned protocol + learning entries

What's Inside

Wave 1 — Clinical Safety Foundations

Pydantic schema validation at reconstruction time — zero invalid protocols saved
AST-based logic validation for conditional expressions (no fragile regex on structured logic)
LLM contract validation — detects model drift across provider/version changes

Wave 2 — Memory & Learning

Hard rules engine — automatic blocking of invalid suggestions before they reach the model
Reference validator — cross-checks every cited evidence against the playbook
Change verifier — post-reconstruction validation of what was actually modified
Feedback learner — rejection patterns and validation failures persisted to memory_qa.md, feeding future runs

Wave 3 — Observability & Cost Control

Real-time token counter per session
Audit reports (_AUDIT.txt) for compliance traceability
Cost telemetry — real vs. estimated cost per session
Spider-aware reconstruction — the LLM operates with explicit awareness of the Daktus protocol structure

Wave 4.1–4.3 — Agent Intelligence & Feedback Loop

Alert rules module with templates and few-shot learning
Suggestion validator — filters anti-patterns and duplicates
Conditional sanitization for invalid logical expressions emitted by the LLM
Robust JSON parser — handles multi-line strings, truncated JSON, retry with exponential backoff
Dual learning loop — implementation failures and validation errors both feed memory_qa.md

Key Design Decisions

LLM-as-judge with deterministic validation underneath. Semantic and clinical judgment goes to the model. Structural integrity, schema conformance, and logic checks are scripted and non-negotiable. Probabilistic judgment, deterministic gates.

Multi-model routing for cost/quality control. Different models for different jobs, swappable by config:

Model	Use case
`google/gemini-2.5-flash`	Fast first-pass analysis, low cost
`anthropic/claude-sonnet-4.5`	High-quality reasoning
`x-ai/grok-4.1-fast`	Large-playbook ingestion (2M context)
`anthropic/claude-opus-4.5`	Maximum quality for final passes

AST analysis over regex. Conditional expressions in clinical protocols are program-like. Regex on programs is brittle. The validator parses expressions into ASTs and reasons over structure — robust to whitespace, formatting, and minor LLM drift.

Closed-loop learning persisted to readable memory. memory_qa.md is a human-readable file, not a hidden database. Every rejection or implementation failure becomes a future few-shot example. State is auditable.

Cost as a first-class metric. Every run reports token usage and dollar cost. LLM pipelines without cost telemetry are not production systems.

Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Configure OpenRouter key
echo "OPENROUTER_API_KEY=sk-or-v1-your-key" > .env

# 3. Run
python run_agent.py

Get an OpenRouter key: https://openrouter.ai/keys

Repository Structure

/
├── run_agent.py            # Entry point
├── src/agent/
│   ├── analysis/           # Expanded analysis
│   ├── applicator/         # Protocol reconstruction
│   ├── feedback/           # Learning loop
│   ├── cost_control/       # Token and cost tracking
│   ├── cli/                # Interactive CLI
│   └── core/               # LLM client, logger, loaders
├── models_json/            # JSON protocols and playbooks
├── reports/                # Generated reports
├── memory_qa.md            # Learning memory
└── docs/                   # Documentation, dev history, roadmap

Performance

Metric	Value
Suggestions per analysis	20–50
Typical latency	30–90s
Cost per analysis	$0.00–$0.50
Success rate	>95%
Playbook verifiability	>95%

Tech Stack

Python · Pydantic · AST module (stdlib) · OpenRouter (multi-provider routing: Anthropic, Google, xAI) · Structured outputs (JSON) · memory_qa.md (human-readable learning store)

About

Built and maintained by Daniel Martins at Daktus Health Tech.

Background: engineering (EFOMM — systems modeling, automation) + medicine (UFJF, final-year). I build AI systems for high-stakes domains where the cost of error is real. Probabilistic where probabilistic adds value; deterministic where it doesn't.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.claude		.claude
biblioteca_clinica		biblioteca_clinica
docs		docs
models_json		models_json
reports		reports
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
init_memory_engine.py		init_memory_engine.py
memory_qa.md		memory_qa.md
memory_qa.md.backup		memory_qa.md.backup
memory_qa.md.backup.old		memory_qa.md.backup.old
requirements.txt		requirements.txt
rules_engine_config.json		rules_engine_config.json
run_agent.py		run_agent.py
run_qa_cli.py		run_qa_cli.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agente-daktus-qa

The Pipeline

The Problem

What's Inside

Wave 1 — Clinical Safety Foundations

Wave 2 — Memory & Learning

Wave 3 — Observability & Cost Control

Wave 4.1–4.3 — Agent Intelligence & Feedback Loop

Key Design Decisions

Quick Start

Repository Structure

Performance

Tech Stack

About

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agente-daktus-qa

The Pipeline

The Problem

What's Inside

Wave 1 — Clinical Safety Foundations

Wave 2 — Memory & Learning

Wave 3 — Observability & Cost Control

Wave 4.1–4.3 — Agent Intelligence & Feedback Loop

Key Design Decisions

Quick Start

Repository Structure

Performance

Tech Stack

About

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages