[RFC ?] Hash-Bound Human Approval for MCP Mutations: The Plan-Challenge-Execute Pattern #1150

mirusser · 2026-05-14T01:07:04Z

mirusser
May 14, 2026

Hi!

I've been building a 'production-ready' gateway for AI-assisted Kubernetes operations (Kubernetes-MCP-Guard) and I'd like to raise a security concern and propose a pattern that I think deserves standardization across MCP infrastructure servers.

The Problem: Consent Fatigue and TOCTOU-Style Gaps in Simple Approval Flows

Current approaches to human approval for AI-assisted mutations often fall into one of three patterns, each with its own gap:

Boolean prompts ("Do you want me to restart nginx-prod? yes/no") — the human approves based on a description the AI wrote, not the actual payload. A compromised or prompt-injected AI can describe one thing and send another. This creates a Time-of-Check to Time-of-Use (TOCTOU)-style gap.
UI takeover (for example, AWS Nova Act) — designed for browser UI automation. When applied to structured API mutations, the approved payload may not be cryptographically bound between the human's approval click and the actual API call. This is a useful contrast, but not a mutation approval protocol.
Stateful workflow engines (for example, Oracle Integration Cloud HITL, BPEL) — can address TOCTOU-style issues when designed with immutable plan storage and hash binding, but require adopting a heavyweight workflow platform. That is not always a good fit for teams running open-source Kubernetes tooling.

The existing --read-only flags and RBAC in kubernetes-mcp-server are excellent safeguards, but they do not solve the case where an AI-assisted workflow legitimately has write access, a human has approved a specific operation, and then the payload drifts before execution.

The Proposed Solution: Plans, Challenges, and Hashes

In Kubernetes-MCP-Guard I've implemented a pattern I'm calling Plan-Challenge-Execute. Here's how it works:

1. Plan Generation, Not Execution

The AI calls a planning/request tool such as request_apply_manifest or request_restart_deployment.

Instead of mutating the cluster, the server:

Runs a server-side dry-run against the real Kubernetes API
Runs the manifest through a policy validator that rejects privileged containers, hostPath, hostNetwork, dangerous capabilities, and similar risky patterns
Computes a diff against the live cluster state
Writes a pending plan JSON document with the dry-run result, policy findings, diff, requester identity, and proposed payload
Returns a plan ID and a summary

No cluster mutation happens during this phase.

2. Out-of-Band Approval

The AI/client then requests approval for the plan.

In my implementation, the same apply_approved_plan(planId) tool is stateful: before approval it returns an approval URL; after approval it executes the approved plan. For a portable contract, this could also be modeled as a separate create_approval_challenge(planId) phase.

The gateway:

Creates an ApprovalChallenge record with a SHA-256 digest of the pending plan file
Sets a 15-minute TTL by default
Binds the challenge to the requesting subject
Returns an approval URL to the AI client

The human must open the URL in a browser session authenticated through the gateway's own out-of-band OAuth flow, independent of the AI client's session.

The browser renders the exact plan being approved:

The proposed Kubernetes operation
The diff against live cluster state
The server-side dry-run result
Policy validation findings
Requester and target resource information

The human then clicks Approve or Reject.

3. Hash-Bound Execution Gate

The AI/client calls the execution tool again after approval.

Before mutating Kubernetes, the server:

Recomputes the SHA-256 digest of the pending plan file
Verifies that the digest matches the digest captured in the approval challenge
Verifies that the approval challenge is still valid, unexpired, and single-use
Verifies that the approving OAuth sub matches the requesting sub in same-subject mode
Re-runs the server-side dry-run so stale approvals are refused if cluster state has changed
Only then mutates Kubernetes

If the pending plan was tampered with between challenge creation and execution, the digest comparison fails and the operation is refused with an error such as approval_hash_mismatch.

The AI/client cannot cause a different stored payload to be executed without invalidating the approval.

Key Security Properties

Property	Mechanism
Payload integrity	SHA-256 digest of the pending plan, compared before execution
Out-of-band identity	Approver's OAuth `sub` must match requester's `sub` in same-subject mode
Time-bounded approval	ApprovalChallenge TTL is 15 minutes by default; expired challenges are refused
Replay prevention	Challenge is single-use; approved challenges are consumed
Double-spend prevention	`applied/<planId>.json` marker blocks a second execution
Pre-apply drift detection	Server-side dry-run is re-run immediately before apply; stale approval is refused
Audit trail	Every state transition is written to `audit.jsonl` with typed payloads

Proof: SafetyE2E Coverage Against Real Keycloak + Kubernetes Paths

These properties are not just documented — they are covered by targeted SafetyE2E regression tests that run against a real Keycloak instance via Testcontainers and a real Kubernetes cluster:

Safety property	Test
TOCTOU block through digest mismatch	`PlanHashMismatchTests`
Plan file tampered between challenge and click	`ModifiedPendingPlanTests`
Wrong-user approval refused across endpoint, service, browser, and real-JWT paths	`WrongUserApprovalTests.ApproveChallengeEndpoint_ByDifferentSubject_IsRefused`
Expired challenge refused	`ExpiredApprovalTests`
Already-applied plan refused	`AlreadyAppliedPlanTests`
Pre-apply dry-run failure blocks execution	`DryRunFailureTests`
Dangerous manifest blocked by policy	`DangerousManifestTests`
Full happy path: browser approval to audit trail	`FullApprovalFlowTests.RestartDeployment_ApprovedThroughBrowser_AppliesExactPlanAndAudits`
RBAC: read-only service account cannot apply	`RbacMatrixTests`

The main branch includes SafetyE2E coverage for digest mismatch, modified pending plans, expired approvals, wrong-user approvals, already-applied plans, dangerous manifests, dry-run failure, and RBAC boundaries.

The suite exercises real gateway, MCP, and Kubernetes paths and uses real Keycloak JWTs for MCP bearer authentication. Browser approval OAuth is simulated at the callback/backchannel boundary in tests, with separate service-level coverage for real-JWT wrong-user rejection.

A GitHub Actions workflow (safety-e2e.yml) runs the suite against an ephemeral KinD cluster on demand.

Reference implementation and tests:

Repository: https://github.com/mirusser/Kubernetes-MCP-Guard
SafetyE2E tests: https://github.com/mirusser/Kubernetes-MCP-Guard/tree/main/tests/InfraGate.Safety.E2E.Tests
On-demand SafetyE2E workflow: https://github.com/mirusser/Kubernetes-MCP-Guard/blob/main/.github/workflows/safety-e2e.yml

Why This Needs a Portable Standard

MCP already has useful primitives:

Tool annotations such as readOnlyHint and destructiveHint
Human-in-the-loop guidance
Elicitation / URL mode for out-of-band sensitive interactions

Those are good foundations.

What MCP does not currently standardize is a portable mutation-approval profile:

A planId
An immutable digest of the proposed mutation
An approval challenge
Same-user or policy-defined identity binding
TTL
Single-use semantics
Drift checking
Execute-after-approval behavior

Servers must implement these details themselves today.

Your repository is one of the most visible Kubernetes/OpenShift MCP implementations, so it seems like a good place to incubate this pattern. Without a shared profile, infrastructure MCP servers may diverge:

MCP clients will not know whether to expect a boolean needs_approval flag, a URL, a digest, or nothing
Security auditors will have no shared language for verifying human-in-the-loop mutation safety
Teams may rely on ad-hoc prompts or untrusted destructiveHint annotations for real cluster mutations

I'm proposing an optional MCP mutation-approval profile that defines a minimal contract for structured mutation approval:

A plan phase that returns a plan ID and digest, not a mutation
A challenge phase that creates a time-bounded, identity-bound approval ticket
An execute phase that verifies both the digest and the challenge before mutating

This does not require adoption of my implementation. It requires agreement on the interface contract so that MCP clients and security reviewers can reason about approval state portably.

Threat Model

This profile treats the mutation server or gateway as the trusted policy enforcement point.

The goal is to prevent the model, MCP client, or intermediate workflow from substituting, replaying, or executing a different payload after human review.

This profile is intended to protect against:

Prompt-injected or compromised AI/client behavior
Misleading natural-language summaries
Payload substitution between planning, approval, and execution
Replay of already-approved plans
Wrong-user approval
Stale approvals after cluster state changes
Accidental double execution

This profile is not intended to prove correctness of a malicious server implementation. A malicious server could still render one plan and execute another. That is outside the trust boundary of this pattern.

The Ask

Is this TOCTOU-style concern recognized by the team? Have you considered it in your roadmap?
Would you be open to collaborating on this thread to formalize a propose → challenge → execute tool contract for this repository or as an MCP mutation-approval profile?
If yes, I'm happy to draft a more formal spec document for review.

For reference, the architecture rationale is documented here:

https://github.com/mirusser/Kubernetes-MCP-Guard/blob/main/docs/why-separated-plan-from-challenge.md

Short version:

https://github.com/mirusser/Kubernetes-MCP-Guard#the-three-security-gates

Thanks for the great work on this project.

— @mirusser

mirusser · 2026-05-14T01:33:38Z

mirusser
May 14, 2026
Author

Small clarification after looking more closely at the current repo state:

I see that kubernetes-mcp-server already has confirmation_rules and elicitation support, including tool-level and kube-level confirmation gates. That is definitely related to the problem space I’m describing.

The distinction I’m trying to explore here is narrower: not “should dangerous actions ask for confirmation?”, but “can the confirmation be bound to the exact mutation payload that will later be executed?”

In other words, I see the existing confirmation rules as a strong foundation. The gap I’m proposing to discuss is whether high-risk mutation tools should optionally support a plan/challenge/execute lifecycle where:

the server first creates a concrete mutation plan
the approval is bound to a digest of that plan
execution verifies the same digest, approval state, TTL, requester/approver binding, and current dry-run result before mutating

So this is meant as a complement to the existing elicitation/confirmation system, not a claim that the repo has no HITL support today.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC ?] Hash-Bound Human Approval for MCP Mutations: The Plan-Challenge-Execute Pattern #1150

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[RFC ?] Hash-Bound Human Approval for MCP Mutations: The Plan-Challenge-Execute Pattern #1150

Uh oh!

mirusser May 14, 2026

The Problem: Consent Fatigue and TOCTOU-Style Gaps in Simple Approval Flows

The Proposed Solution: Plans, Challenges, and Hashes

1. Plan Generation, Not Execution

2. Out-of-Band Approval

3. Hash-Bound Execution Gate

Key Security Properties

Proof: SafetyE2E Coverage Against Real Keycloak + Kubernetes Paths

Why This Needs a Portable Standard

Threat Model

The Ask

Replies: 1 comment

Uh oh!

mirusser May 14, 2026 Author

mirusser
May 14, 2026

mirusser
May 14, 2026
Author