Thanks to visit codestin.com
Credit goes to github.com

Skip to content

daanmt/agente-daktus-qa

Repository files navigation

agente-daktus-qa

Stage 2 of the Daktus CDSS Pipeline. Eval and correction harness for LLM-generated clinical protocols. Combines Pydantic schemas, AST-based logic validation, LLM-as-judge audits, multi-model routing, closed-loop feedback learning, and cost telemetry to make probabilistic outputs safe in a clinical-grade pipeline.

Status: v3.2.0 · Waves 1–4.3 complete · feedback loop active


The Pipeline

This repository is Stage 2 of a three-stage Clinical Decision Support pipeline:

[agente-daktus-content]  →  [agente-daktus-qa]  →  [daktus-conduct-engine]
   Stage 1: Production         Stage 2: Validation       Stage 3: Decision Engine
   briefing → JSON protocol    validates, corrects,      consumes encounter bundle
                               versions, learns          → structured anamnesis
                                                         + management + evidence

The Problem

LLM outputs in clinical workflows are probabilistic — a content pipeline that produces protocol JSON cannot, on its own, guarantee:

  • Structural validity (JSON conforms to the Daktus schema)
  • Logical consistency (conditionals are sound, no orphan branches, no contradictions)
  • Evidence integrity (every claim is traceable to a reference in the playbook)
  • Reproducibility across model versions and providers

This repository is the validation, correction, and learning layer that closes the loop.

Input: Clinical protocol JSON (from Stage 1) + Markdown/PDF playbook Output: Validation report + prioritized suggestions + corrected, versioned protocol + learning entries


What's Inside

Wave 1 — Clinical Safety Foundations

  • Pydantic schema validation at reconstruction time — zero invalid protocols saved
  • AST-based logic validation for conditional expressions (no fragile regex on structured logic)
  • LLM contract validation — detects model drift across provider/version changes

Wave 2 — Memory & Learning

  • Hard rules engine — automatic blocking of invalid suggestions before they reach the model
  • Reference validator — cross-checks every cited evidence against the playbook
  • Change verifier — post-reconstruction validation of what was actually modified
  • Feedback learner — rejection patterns and validation failures persisted to memory_qa.md, feeding future runs

Wave 3 — Observability & Cost Control

  • Real-time token counter per session
  • Audit reports (_AUDIT.txt) for compliance traceability
  • Cost telemetry — real vs. estimated cost per session
  • Spider-aware reconstruction — the LLM operates with explicit awareness of the Daktus protocol structure

Wave 4.1–4.3 — Agent Intelligence & Feedback Loop

  • Alert rules module with templates and few-shot learning
  • Suggestion validator — filters anti-patterns and duplicates
  • Conditional sanitization for invalid logical expressions emitted by the LLM
  • Robust JSON parser — handles multi-line strings, truncated JSON, retry with exponential backoff
  • Dual learning loop — implementation failures and validation errors both feed memory_qa.md

Key Design Decisions

LLM-as-judge with deterministic validation underneath. Semantic and clinical judgment goes to the model. Structural integrity, schema conformance, and logic checks are scripted and non-negotiable. Probabilistic judgment, deterministic gates.

Multi-model routing for cost/quality control. Different models for different jobs, swappable by config:

Model Use case
google/gemini-2.5-flash Fast first-pass analysis, low cost
anthropic/claude-sonnet-4.5 High-quality reasoning
x-ai/grok-4.1-fast Large-playbook ingestion (2M context)
anthropic/claude-opus-4.5 Maximum quality for final passes

AST analysis over regex. Conditional expressions in clinical protocols are program-like. Regex on programs is brittle. The validator parses expressions into ASTs and reasons over structure — robust to whitespace, formatting, and minor LLM drift.

Closed-loop learning persisted to readable memory. memory_qa.md is a human-readable file, not a hidden database. Every rejection or implementation failure becomes a future few-shot example. State is auditable.

Cost as a first-class metric. Every run reports token usage and dollar cost. LLM pipelines without cost telemetry are not production systems.


Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Configure OpenRouter key
echo "OPENROUTER_API_KEY=sk-or-v1-your-key" > .env

# 3. Run
python run_agent.py

Get an OpenRouter key: https://openrouter.ai/keys


Repository Structure

/
├── run_agent.py            # Entry point
├── src/agent/
│   ├── analysis/           # Expanded analysis
│   ├── applicator/         # Protocol reconstruction
│   ├── feedback/           # Learning loop
│   ├── cost_control/       # Token and cost tracking
│   ├── cli/                # Interactive CLI
│   └── core/               # LLM client, logger, loaders
├── models_json/            # JSON protocols and playbooks
├── reports/                # Generated reports
├── memory_qa.md            # Learning memory
└── docs/                   # Documentation, dev history, roadmap

Performance

Metric Value
Suggestions per analysis 20–50
Typical latency 30–90s
Cost per analysis $0.00–$0.50
Success rate >95%
Playbook verifiability >95%

Tech Stack

Python · Pydantic · AST module (stdlib) · OpenRouter (multi-provider routing: Anthropic, Google, xAI) · Structured outputs (JSON) · memory_qa.md (human-readable learning store)


About

Built and maintained by Daniel Martins at Daktus Health Tech.

Background: engineering (EFOMM — systems modeling, automation) + medicine (UFJF, final-year). I build AI systems for high-stakes domains where the cost of error is real. Probabilistic where probabilistic adds value; deterministic where it doesn't.

About

Stage 2 of the Daktus CDSS pipeline: eval and correction harness for LLM-generated clinical protocols. Pydantic schemas, AST-based logic validation, LLM-as-judge audits, multi-model routing via OpenRouter, closed-loop feedback learning, cost telemetry.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages