🔁 Agent CI

Agent CI is the continuous improvement layer for AI agents and LLM applications.

Overview

Agent CI runs alongside your agent or application in production. It observes decisions, scores outcomes, and applies targeted improvements - so the system learns and performs better over time.

Instead of manually rewriting prompts or fine-tuning models after failures, Agent CI creates a feedback and optimization loop that continuously adjusts reasoning, planning, and tool usage behavior.

Deploy once. The agent keeps improving.

🔬 Research: FRED

Pegasi Shield’s hallucination module is powered by FRED — Financial Retrieval‑Enhanced Detection & Editing. The method was peer‑reviewed and accepted to the ICML 2025 Workshop. Code, evaluation harness and demo notebooks are in fred/.

🔧 Key capabilities

Area	What Shield provides
Prompt security	Detects and blocks prompt injections, role hijacking, system‑override attempts.
Output sanitisation	Removes personal data, hate speech, defamation and other policy violations.
Hallucination controls	Scores and rewrites ungrounded text using a 4B parameter model at performance on par with o3.
Observability	Emits structured traces and metrics (OpenTelemetry) for dashboards and alerts.
Deployment	Pure‑Python middleware, Docker image, or Helm chart for Kubernetes / VPC installs.

⚡ Quick start

*Coming July 25th

pip install pegasi-shield

from pegasi_shield import Shield
from openai import OpenAI

client = OpenAI()
shield = Shield()                       # uses default policy

messages = [{"role": "user", "content": "Tell me about OpenAI o3"}]
response = shield.chat_completion(
    lambda: client.chat.completions.create(model="gpt-4.1-mini", messages=messages)
)

print(response.choices[0].message.content)

Shield.chat_completion accepts a callable that runs your normal LLM request. Shield returns the same response object—or raises ShieldError if the call is blocked.

📚 How it works

Prompt firewall — lightweight rules (regex, AST, ML) followed by an optional LLM check.
LLM request — forwards the original or patched prompt to your provider.
Output pipeline
- heuristics → vector similarity checks → policy LLM
- optional “Hallucination Lens” rewrite if factuality score is below threshold.
Trace — JSON event with allow/block/edit decision and risk scores.

All stages are configurable via YAML or Python.

Roadmap

v0.5 launch (July 18th)
LiveKit Agent Tutorial
LangGraph Agent Tutorial
Fine‑grained policy language
Streaming output inspection
JavaScript/TypeScript SDK

Contributing

Issues and pull requests are welcome. See CONTRIBUTING.md for details.

License

Apache 2.0

::contentReference[oaicite:0]{index=0}

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
core		core
examples		examples
fred		fred
static/images		static/images
.DS_Store		.DS_Store
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_lock.txt		requirements_lock.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔁 Agent CI

Overview

Deploy once. The agent keeps improving.

🔬 Research: FRED

🔧 Key capabilities

⚡ Quick start

📚 How it works

Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

pegasi-ai/agent-ci

Folders and files

Latest commit

History

Repository files navigation

🔁 Agent CI

Overview

Deploy once. The agent keeps improving.

🔬 Research: FRED

🔧 Key capabilities

⚡ Quick start

📚 How it works

Roadmap

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages