AgentShield

A Self-Improving + Autonomous Agent Made to Align ML Workflows.

Inspiration

In my work as an AI Alignment Fellow, I ran through AI Agents performing ML workflows showing several critical rogue behaviors; most common rogue behaviors are data_leakage, access_dev_set, and agent_impatience. Thus, in this hackathon, I am presenting Agent Shield.

What is AgentShield?

Agent Shield is a self-improving, fully autonomous AI Agent that helps other AI agents specifically conducting ML workflows in this case stay aligned to what a golden trajectories without rogue behaviors should look like.

Tech Stack:

Claude Agent SDK [Opus] - The agent performs autonomous threat analysis by comparing agent trajectories against learned baseline patterns and taxonomy of 11 rogue behaviors. Makes independent decisions on whether agents are malicious with 95%+ confidence, no human intervention required.

Redis MCP: We used to store clean workflow patterns during training phase and caches known violations for instant recognition. Speeds up repeated detections from 8 seconds to 0.8 seconds (10x faster), making ShieldAgent self-improving through continuous learning.

Skyflow: We utilized Skyflow to tokenize sensitive data (API keys, passwords, secrets) before Claude analysis, reducing token count by 60% and ensuring PCI/HIPAA compliance. Protects credentials from leaking during security analysis while cutting inference costs.

This is a Next.js project bootstrapped with create-next-app.

Getting Started

First, run the development server:

npm run dev
# or
yarn dev
# or
pnpm dev
# or
bun dev

Open http://localhost:3000 with your browser to see the result.

You can start editing the page by modifying app/page.tsx. The page auto-updates as you edit the file.

This project uses next/font to automatically optimize and load Geist, a new font family for Vercel.

Learn More

To learn more about Next.js, take a look at the following resources:

Next.js Documentation - learn about Next.js features and API.
Learn Next.js - an interactive Next.js tutorial.

You can check out the Next.js GitHub repository - your feedback and contributions are welcome!

Deploy on Vercel

Challenges While Building:

Integration was challenging, but we overcame it just fine.

Accomplishments that we're proud of

Deploying what actually works and seems to be able to self-correct trajectories showing rogue behaviors to zero rogue behaviors. If it continues to self-correct this way, then it can eventually produce its own guardrails and keep track of other AI Agents performing ML workflows.

Learning Takeaway:

We are at the tip of an iceberg here with AI Agent Security, and we are just beginning to shield AI agents in a way that is contain-able.

What's next for Agent Shield

To turn this into a policy that can be implemented by other ML-performing AI Agents and publish a paper/code surrounding step-level security benchmark for AI Agents in Production, following Steca https://arxiv.org/abs/2502.14276 | https://github.com/WangHanLinHenry/STeCa

Contact:

E. [email protected] LinkedIn: https://linkedin.com/in/gloriafelicia

The easiest way to deploy your Next.js app is to use the Vercel Platform from the creators of Next.js.

Check out our Next.js deployment documentation for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
data/trajectories		data/trajectories
lib		lib
public		public
.gitignore		.gitignore
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AgentShield

Inspiration

What is AgentShield?

Tech Stack:

Getting Started

Learn More

Deploy on Vercel

Challenges While Building:

Accomplishments that we're proud of

Learning Takeaway:

What's next for Agent Shield

Contact:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

glo26/agentshield

Folders and files

Latest commit

History

Repository files navigation

AgentShield

Inspiration

What is AgentShield?

Tech Stack:

Getting Started

Learn More

Deploy on Vercel

Challenges While Building:

Accomplishments that we're proud of

Learning Takeaway:

What's next for Agent Shield

Contact:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages