A Self-Improving + Autonomous Agent Made to Align ML Workflows.
In my work as an AI Alignment Fellow, I ran through AI Agents performing ML workflows showing several critical rogue behaviors; most common rogue behaviors are data_leakage, access_dev_set, and agent_impatience. Thus, in this hackathon, I am presenting Agent Shield.
Agent Shield is a self-improving, fully autonomous AI Agent that helps other AI agents specifically conducting ML workflows in this case stay aligned to what a golden trajectories without rogue behaviors should look like.
Claude Agent SDK [Opus] - The agent performs autonomous threat analysis by comparing agent trajectories against learned baseline patterns and taxonomy of 11 rogue behaviors. Makes independent decisions on whether agents are malicious with 95%+ confidence, no human intervention required.
Redis MCP: We used to store clean workflow patterns during training phase and caches known violations for instant recognition. Speeds up repeated detections from 8 seconds to 0.8 seconds (10x faster), making ShieldAgent self-improving through continuous learning.
Skyflow: We utilized Skyflow to tokenize sensitive data (API keys, passwords, secrets) before Claude analysis, reducing token count by 60% and ensuring PCI/HIPAA compliance. Protects credentials from leaking during security analysis while cutting inference costs.
This is a Next.js project bootstrapped with create-next-app.
First, run the development server:
npm run dev
# or
yarn dev
# or
pnpm dev
# or
bun devOpen http://localhost:3000 with your browser to see the result.
You can start editing the page by modifying app/page.tsx. The page auto-updates as you edit the file.
This project uses next/font to automatically optimize and load Geist, a new font family for Vercel.
To learn more about Next.js, take a look at the following resources:
- Next.js Documentation - learn about Next.js features and API.
- Learn Next.js - an interactive Next.js tutorial.
You can check out the Next.js GitHub repository - your feedback and contributions are welcome!
Integration was challenging, but we overcame it just fine.
Deploying what actually works and seems to be able to self-correct trajectories showing rogue behaviors to zero rogue behaviors. If it continues to self-correct this way, then it can eventually produce its own guardrails and keep track of other AI Agents performing ML workflows.
We are at the tip of an iceberg here with AI Agent Security, and we are just beginning to shield AI agents in a way that is contain-able.
To turn this into a policy that can be implemented by other ML-performing AI Agents and publish a paper/code surrounding step-level security benchmark for AI Agents in Production, following Steca https://arxiv.org/abs/2502.14276 | https://github.com/WangHanLinHenry/STeCa
E. [email protected] LinkedIn: https://linkedin.com/in/gloriafelicia
The easiest way to deploy your Next.js app is to use the Vercel Platform from the creators of Next.js.
Check out our Next.js deployment documentation for more details.