What is a confused-deputy attack against an AI agent?

When a managed AI agent holds an OAuth token to a SaaS API, the token does not carry which user the agent is currently acting for. The agent can be tricked, via prompt injection, a poisoned document, or simple instruction-following, into acting beyond that user's authority. The SaaS API has no way to tell. Proxilion makes the principal cryptographically explicit on every call.

Is Proxilion a SaaS or a paid product?

No. Proxilion is MIT-licensed, self-hosted, with no telemetry, no phone-home, and no paid tier. The repository is the product.

How does Proxilion differ from an LLM gateway or prompt firewall?

LLM gateways and prompt firewalls inspect text. Proxilion enforces cryptographic capability chains in the OAuth and HTTP path. It refuses to issue authority the user does not have, by construction, not by heuristic.

Which agents and SaaS APIs are supported?

Anthropic's managed Claude and OpenAI hosted assistants on the agent side. Google Drive, Gmail, and Google Calendar on the SaaS side at launch. The adapter model is open.

What is a PCA and the Trust Plane?

A PCA, or Proof of Causal Authority, is a signed cryptographic statement that a specific principal is authorized to perform a specific set of operations. The first PCA in a chain is rooted at the human user at the identity provider. Every downstream action mints a successor PCA, which can only have equal or fewer permissions than its predecessor. The Trust Plane is the service that signs and validates these PCAs, enforcing the PIC protocol's three invariants: Provenance, Identity, and Continuity.

MIT-licensed. Self-hosted. Zero telemetry.

Your managed AI agent has too much authority. Proxilion fixes that.

Name: Proxilion
Author: Clay Good

Your organization is wiring managed AI agents into Google Workspace, Salesforce, Jira, Notion, and a dozen other systems that hold PHI, PII, source code, customer data, and trade secrets. You are giving those agents skills, memories, and multi-step workflows. The honest truth is that any of those agents can quietly do the wrong thing, on the wrong data, for the wrong user, and the only signal you would get is a forensic audit log from the agent vendor after the damage is already done.

Proxilion is a self-hosted reverse proxy that sits in the OAuth path between the agent and every SaaS API it touches. It refuses any action the human user did not personally authorize, strips prompt injection out of fetched documents before the agent ever sees them, gates external sends, streams every action to your SOC in real time, and gives you a killswitch that revokes an agent's authority in one request cycle. Prevention by construction, not detection after the fact.

View on GitHub

Free forever MIT licensed Built in Rust No phone-home

$ git clone https://github.com/clay-good/proxilion

$ cd proxilion && docker compose up -d --wait

$ bash scripts/smoke-pic.sh # 60 seconds to first PCA_0

Thank you, standing on the shoulders of giants

The cryptographic primitive underneath is PIC (Provenance, Identity, Continuity).

The math that lets Proxilion say "this exact action was authorized by this exact human" is not something we invented. It comes from PIC, an authority protocol built around three formal invariants: Provenance (every action traces back to an immutable origin), Identity (the origin identity cannot mutate across hops), and Continuity (authority can only shrink, never broaden, at each step). We use the open-source PIC reference implementation as our signing and verification primitive. Credit and respect to Nicola Gallo for designing and publishing it.

Everything else, the OAuth interception layer, the SaaS adapters, the read filtering, the write gating, the real-time action stream, the killswitch, the YAML policy engine, the dashboard, the SIEM plumbing, the Rust reverse proxy, the threat model, the deployment story, and the entire applied thesis that the OAuth boundary is the one preventative chokepoint for governing managed AI agents, is original Proxilion work. PIC is the cryptographic primitive. Proxilion is the deployable system that turns that primitive into something a security team can use against the actual confused-deputy attacks they are facing today.

Read about PIC Nicola Gallo on GitHub

Three things every other tool cannot give you

Not a prompt firewall. Not an LLM gateway. Not an MCP middleware. A cryptographic enforcement plane in the OAuth path.

P₀

Every action traces back to a real human

Before the agent does anything, Proxilion authenticates the actual human user at your identity provider (Okta, Azure AD, or Google Workspace) and mints a signed statement of who they are and what they can do. The agent cannot invent authority for itself, it can only spend what that human was given, and the chain of custody is mathematically provable forever.

⊑

Permissions can only get smaller, never larger

Once Alice grants her agent read access to one specific document, that agent cannot upgrade itself partway through a task to write access on every document in the tenant. Each step in the chain is allowed to do equal or less than the step before it, and Proxilion refuses to sign anything else. There are no heuristics to game and no prompt-engineering bypasses.

∎

Audit logs an auditor can actually trust

Every single agent action is logged with a cryptographic signature you can verify offline, years later, on your own machine. Hand the logs to a SOC 2, ISO 27001, HIPAA, or FedRAMP auditor and they can confirm authenticity themselves, without taking your word for it, without trusting the agent vendor, and without trusting Proxilion either.

What Proxilion actually does for your org

Cryptographic capability chains alone do not stop a managed agent from acting on the wrong data. Proxilion turns the math into a deployable enforcement layer that real security teams can install this afternoon.

OAuth interception, in the path

Proxilion intercepts the OAuth flow between the agent platform and your SaaS providers before consent, swaps in a Proxilion-issued bearer token, and stays in path for every subsequent request. The agent vendor sees a normal upstream; you see, sign, and gate every call.

Read-filtering for prompt injection

Documents returned from Drive, Gmail, Notion, and other upstreams are scanned before they reach the agent. Known prompt-injection patterns (delimiter confusion, hidden Unicode, base64-encoded instructions, "ignore prior instructions") are stripped or quarantined. The agent literally cannot read the poison.

Write-gating with human-in-the-loop

Outgoing emails to external domains, mass deletes, calendar invites to attackers, file shares outside the org are blocked unless a real human explicitly approves through Slack or a ticket. Configurable per sender, per domain, per action class.

Real-time action stream and killswitch

Every agent action streams to a live operator dashboard and your SIEM the moment it happens. If something goes wrong, one click revokes every capability tied to that agent or user. The agent's next request returns 403 within one request cycle, no matter how many tokens it cached.

Policy engine, written in YAML

A fast, compiled match-expression engine lets your security team write rules like "this agent can read engineering docs but never finance" or "no external email sends without justification" in YAML, with hot-reload. No DSL to learn, no rules engine vendor to negotiate with.

SaaS adapters you can extend

Google Drive, Gmail, and Calendar adapters ship at launch, each one understanding the upstream's request shape so policy can reason about specific files, recipients, and events, not just URLs. The adapter pattern is open; add Salesforce, Jira, Notion, or your internal API in a few hundred lines of Rust.

Sits between the agent and your SaaS APIs

Proxilion intercepts the OAuth flow before it hits Google, Microsoft, or any other upstream. Then it stays in path, minting and verifying a fresh PCA per action.

capability flow response path Proxilion enforcement

The invariant, in plain English. The agent shows up holding a token. Proxilion checks who that token actually belongs to (a specific human user, say Alice), and what Alice was originally allowed to do. If the action the agent is trying to take is something Alice was not allowed to do, Proxilion refuses the request and never forwards it to Google or any other upstream. The refused attempt is then signed, logged forever, and shipped straight to your SIEM so your security team sees exactly what was tried, by which agent, on whose behalf.

The risks of managed AI agents, and what Proxilion does about each one

Every row is a real, exploitable failure mode in production agent deployments today. Every detection and block is implemented, not aspirational.

Risk	Detect How Proxilion sees it	Block How Proxilion stops it
Confused deputy Agent acts beyond the user's authority because the OAuth token has no principal binding.	Every request the agent makes carries a Proxilion-issued token that names the exact human user it is acting for, with that user's identity cryptographically signed. If the user identity is missing or tampered with, the request fails immediately.	Proxilion checks the requested action against what that named human user is actually allowed to do. If the action is outside the user's authority, it is refused before ever reaching Google, Gmail, or any other upstream. This is a deterministic permission check on signed claims, not a guess from a language model.
Privilege escalation across hops Agent chains tool calls and broadens scope mid-chain. `read` becomes `write`.	Each successor PCA's ops are diffed against its predecessor; any non-subset op is a continuity violation.	Proxilion refuses to sign any capability that grants more authority than the step before it. Each handoff in the chain can only have equal or fewer permissions than the previous one, enforced cryptographically at every hop, so the agent simply cannot escalate even if its prompt tells it to.
Prompt injection via fetched documents Poisoned Drive doc instructs the agent to exfiltrate other files or email outside the org.	Response bodies from Drive, Gmail, and other upstreams are scanned for known injection patterns (delimiter confusion, "ignore prior instructions", base64-encoded directives, hidden Unicode).	Matched regions are stripped or quarantined before the response is returned to the agent. Configurable per-route in `policy.yaml`; default-deny for untrusted-source documents.
Unscoped OAuth tokens Agent holds `drive.readonly` for the whole tenant when it only needs one doc.	Proxilion sits in the OAuth flow and records exactly which scopes the user consented to. Then, for every individual action the agent takes, Proxilion records the specific file, email, or calendar event being touched.	Even though the OAuth token may technically grant access to the whole tenant, Proxilion narrows each action to just the one resource the agent is supposed to touch right now. Any attempt to reach a different file or mailbox in the same scope is refused.
Data exfiltration via external recipients Agent is talked into emailing sensitive content to attacker-controlled domains.	`messages.send` calls are parsed; recipient domains are extracted and matched against your allow-list.	If a recipient is outside your organization, Proxilion refuses to send the email until a real human (not the agent) explicitly approves it through a separate channel like Slack or a ticket. Fully configurable per sender and per domain, so internal traffic stays frictionless and only the risky sends get gated.
Bulk or mass-mutation abuse Compromised agent deletes thousands of files or sends mass email.	Proxilion counts how many actions each user's agent session is taking and at what rate. When a single user's agent suddenly tries to delete a thousand files in a minute, the counter trips before damage is done.	One-click killswitch that instantly revokes every active capability tied to that user or agent, network-wide, within a single request cycle. The agent's next call returns 403, no matter how many tokens it is holding.
Replay and token reuse Captured bearer token is reused beyond the intended action.	Every action gets a unique one-time identifier and a short expiration window. If a stolen token is replayed, Proxilion sees the same identifier come through twice.	The replayed request is rejected at the Proxilion edge before it ever reaches Google or any upstream. No duplicate side-effect, no data leaked, and the attempt is logged with the source IP.
Untraceable agent actions for compliance "Which user was the agent acting for when it touched this PHI?" No one can answer.	Every upstream call is logged with the full PCA chain, COSE-signed, append-only.	This one is not a block, it is an evidence guarantee. Every agent action is logged with a cryptographic signature that anyone can verify years later, offline, without trusting Proxilion or any vendor. SOC 2, HIPAA, and ISO 27001 audit evidence drops out of the proxy automatically. When an auditor asks "which user was the agent acting for," you can prove it.
Vendor and supply-chain trust assumption You are trusting the agent vendor's claims about what their model did.	The CAT signing key is customer-held. Proxilion has no way to forge PCAs on your behalf. Neither does Anthropic, OpenAI, or anyone else.	Trust is rooted at your own identity provider (Okta, Azure AD, Google Workspace) and a signing key only you hold. You no longer have to take Anthropic, OpenAI, or any agent vendor at their word about what their model did. The proof is in your hands, on your infrastructure.
Insider misuse via agent Legitimate user uses the agent as a laundering layer to do things their direct credentials cannot.	PCA_0 is bound to the user's IdP ops; the agent inherits nothing the user did not already have.	The agent inherits the exact same permissions as the human using it, nothing more. If the user cannot read HR records when logged into Google Workspace directly, asking an AI agent to do it for them fails the same way. The agent is not a permissions loophole.

The Skill Overreach problem

An agent "trained with skills" for the whole org is, in effect, a super-user. Proxilion is the only thing that forces it back into the Human User box.

The agent platforms now ship skills. You train one agent for the whole organization, attach it to Drive, Gmail, Salesforce, Jira, Notion, and a couple of internal APIs, and hand it out to every employee. That single agent now holds the union of every permission any of its users have. You have, in effect, deployed a super-user. The OAuth scope says drive.readonly for the tenant. The skill says "summarize anything the user asks about." The runtime has no idea whether the human on the other end is an intern, a finance lead, or the CEO.

That is the Skill Overreach problem. A skill is authority defined at the agent level. A user is authority defined at the human level. The gap between them is exactly where confused-deputy attacks, prompt-injection exfiltration, and insider laundering live.

Proxilion is the only thing in the stack that forces the skilled agent back into the Human User box. Every call the agent makes is bound to a PCA chain rooted at the specific human it is acting for at that moment. The intern's request to "summarize Q3 financials" fails the same way it would if the intern opened Drive directly. The CEO's request succeeds. The skill stays the same; the authority is no longer the skill's, it is the user's. Prevention by construction, even when the skill itself is overpowered.

Stop hoping your agent behaves. Prove it cannot misbehave.

Self-host in an afternoon. Free, forever. No sales call. No license keys. No data leaves your network.

Made with by Clay Good Get Proxilion on GitHub