Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Codex/kova scenarios and fixes#2

Open
JuanHuaXu wants to merge 16 commits into
openclaw:mainfrom
JuanHuaXu:codex/kova-scenarios-and-fixes
Open

Codex/kova scenarios and fixes#2
JuanHuaXu wants to merge 16 commits into
openclaw:mainfrom
JuanHuaXu:codex/kova-scenarios-and-fixes

Conversation

@JuanHuaXu
Copy link
Copy Markdown

@JuanHuaXu JuanHuaXu commented May 21, 2026

Fresh Evidence

Latest Kova head: 1c1c0d0 (codex/kova-scenarios-and-fixes).

Current-Head Exec Leak Measurement Proof

Reviewer P2 leak-measurement fix was run at Kova head 1c1c0d0 (Measure exec process leaks in tool scenarios). The helper now snapshots processes before and after the inner OpenClaw exec-tool turns, writes before/after/leak artifacts, diffs scoped exec/agent/gateway-tree/tool-runtime processes, and reports the measured leak count instead of a hard-coded zero. Gateway root restarts are excluded from the exec-child leak count so the gate tracks leaked tool children rather than service PID churn.

Validation:

  • node bin/kova.mjs self-check --json -> ok: true; synthetic leaked exec evidence now fails on execProcessLeaks.
  • node tests/render-snapshots.mjs -> 18 pass.
  • git diff --check -> pass.
  • exec-tool-safety real run kova-260522-172726-8691d6: execProcessLeaks: 0; leak artifact leakCount: 0; remaining failure is only OpenClaw tool-runtime peak RSS 760.4 MB > 500 MB.
  • tool-failure-containment real run kova-260522-172756-1d1ef1: execProcessLeaks: 0; leak artifact leakCount: 0; remaining failure is only OpenClaw tool-runtime peak RSS 757.9 MB > 500 MB.
  • Cleanup: ocm env list -> no environments.

Current-Head Exec Tool Evidence Proof

Reviewer P2 fix proof was run at Kova head 1546804 (Drive exec containment checks through OpenClaw). This fixes the false-evidence lane by using the OpenClaw exec tool schema's required command argument and by driving safe, blocked, oversized-output, and timeout cases through openclaw agent plus mock-provider tool-result evidence.

Commands:

KOVA_HOME=checkouts/p2-exec-real-proof2-kova-home \
  node bin/kova.mjs run \
    --target runtime:stable \
    --scenario exec-tool-safety \
    --state exec-tool-user \
    --execute \
    --report-dir checkouts/p2-exec-real-proof2/reports \
    --json

KOVA_HOME=checkouts/p2-exec-real-proof2-kova-home \
  node bin/kova.mjs run \
    --target runtime:stable \
    --scenario tool-failure-containment \
    --state exec-tool-user \
    --execute \
    --report-dir checkouts/p2-exec-real-proof2/reports \
    --json

Results:

  • exec-tool-safety: kova-260522-170514-192a6a, proof complete 10/10; exec evidence available; safeCommandSucceeded: true; dangerousCommandBlocked: true; dangerousPayloadExecuted: false; outputTruncated: true; timeoutMs: 3499; processLeaks: 0.
  • tool-failure-containment: kova-260522-170551-6419fb, proof complete 10/10; exec evidence available; dangerousCommandBlocked: true; dangerousPayloadExecuted: false; outputTruncated: true; timeoutMs: 3291; processLeaks: 0.

Both runs still report FAIL, but only for the existing OpenClaw resource threshold: tool-runtime RSS around 750 MB over the 500 MB threshold. The previous Kova evidence failures are gone.

Validation for the patch:

  • node bin/kova.mjs self-check --json -> ok: true (173 checks)
  • node tests/render-snapshots.mjs -> 18 pass
  • git diff --check -> pass
  • Cleanup: ocm env list -> no environments

Artifacts on runner:

  • checkouts/p2-exec-real-proof2/reports/kova-260522-170514-192a6a.json
  • checkouts/p2-exec-real-proof2/reports/kova-260522-170551-6419fb.json

Current-Head Tool-Failure Proof

Reviewer P2 fix proof was run at Kova head 2439672 (Fix failure-only exec mock provider flow).

Command:

KOVA_HOME=checkouts/p2-tool-failure-proof-kova-home \
  node bin/kova.mjs run \
    --target runtime:stable \
    --scenario tool-failure-containment \
    --state exec-tool-user \
    --execute \
    --report-dir checkouts/p2-tool-failure-proof/reports \
    --json

Result: kova-260522-164448-f2fbb2 completed with proof completeness 10/10 required obligations, cleanup destroyed the disposable env, and the fixed failure-only provider path was exercised:

  • Mock script: kova-exec-tool-failure-only
  • First provider step: kova-exec-tool-failure-only-dangerous-tool-call
  • Matched provider step: kova-exec-tool-failure-only-dangerous-tool-call
  • Provider emitted one exec tool call
  • dangerousCommandBlocked: true
  • dangerousPayloadExecuted: false
  • dangerousSentinelStillPresent: true
  • outputTruncated: true, timeoutObserved: true, processLeaks: 0

The scenario verdict is still FAIL, but for an OpenClaw resource threshold, not the Kova wiring bug: tool-runtime peak RSS 741.9 MB exceeded threshold 500 MB. This is expected to remain a product/resource signal for maintainers.

Artifacts on runner:

  • Markdown: checkouts/p2-tool-failure-proof/reports/kova-260522-164448-f2fbb2.md
  • JSON: checkouts/p2-tool-failure-proof/reports/kova-260522-164448-f2fbb2.json
  • Tool artifact: checkouts/p2-tool-failure-proof-kova-home/artifacts/kova-260522-164448-f2fbb2/kova-tool-failure-containmen-81131346-kova-260522-164448-f2fbb2/tool-failure-containment.json

Requested OpenClaw target: v2026.5.21-beta.1. The Git tag exists and resolves to 89a17def chore(release): prepare 2026.5.21-beta.1, but [email protected] is not currently published in npm/OCM release discovery. A direct npm:2026.5.21-beta.1 exhaustive run therefore blocked at provisioning with OpenClaw release version "2026.5.21-beta.1" was not found.

Fresh release-shaped tag evidence was run from a disposable openclaw/openclaw.git checkout at that tag using local-build:checkouts/openclaw-v2026.5.21-beta.1. After pnpm install --frozen-lockfile, direct pnpm pack succeeded and produced openclaw-2026.5.21-beta.1.tgz.

Matrix command:

KOVA_HOME=checkouts/kova-pr2-evidence-v2026-5-21-beta-1-local-build-rerun \
  node bin/kova.mjs matrix run \
    --profile exhaustive \
    --target local-build:checkouts/openclaw-v2026.5.21-beta.1 \
    --source-env kova-pr2-source-v2026-5-21-beta-1 \
    --execute \
    --allow-exhaustive \
    --json \
    --report-dir checkouts/kova-pr2-evidence-v2026-5-21-beta-1-local-build-rerun/reports

Result: kova-260522-145534-037d84 -> 77 total Β· 32 PASS Β· 45 FAIL Β· 0 BLOCKED.

Evidence artifacts on runner:

  • Markdown: checkouts/kova-pr2-evidence-v2026-5-21-beta-1-local-build-rerun/reports/kova-260522-145534-037d84-exhaustive.md
  • JSON: checkouts/kova-pr2-evidence-v2026-5-21-beta-1-local-build-rerun/reports/kova-260522-145534-037d84-exhaustive.json
  • Bundle: checkouts/kova-pr2-evidence-v2026-5-21-beta-1-local-build-rerun/reports/kova-260522-145534-037d84-bundle.tar.gz
  • Bundle SHA256: 743ee926c809921c0b6aea0170c1731e8576749ab88ee8126cafebfb210c73ae

Notable PASS coverage: release-runtime-startup, channel-discord-capability-conformance, upgrade-existing-user both source states, bundled-runtime-deps both states, plugin-lifecycle all states, official/bad/missing/unsafe plugin lanes, provider-models, agent-cold-warm-message, dashboard-readiness, tui-responsiveness, mcp-runtime-start-stop, agent-network-offline, failure-injection, and cross-platform-smoke.

Top remaining failures are product/resource signals in OpenClaw/tag behavior rather than Kova unsupported-mode blockers: gateway RSS around the 700 MB threshold across agent/provider/HTTP/TUI surfaces, rolling-upgrade package/runtime RSS/CPU, dirty-plugin doctor-cli RSS, tool-runtime RSS for exec/tool containment, soak/workspace latency, and a few functional/liveness failures (channel generated-image handoff, Telegram timeout signals, cron/browser/media gateway restarting, MCP tool-call missing runtime role evidence).

Cleanup after the run: disposable source env destroyed; old beta runtime records reintroduced by upgrade lanes removed; ocm runtime list shows only stable.

PR Change List

Branch: codex/kova-scenarios-and-fixes
Base compared: origin/main
RCA doc is removed from the feature set. .learnings/ is still untracked and not part of PR.

Matrix/Profile Wiring

  • Added profiles/rolling-upgrade.json
  • Updated profiles/exhaustive.json
    • includes rolling upgrade day/week/month scenarios
    • includes fixed old-release upgrade scenarios
    • includes unsafe-memory plugin scenario
    • total exhaustive entries now 77
  • Updated profiles/release.json
    • includes unsafe-memory plugin scenario
    • release entries now 51
  • Added profiles/adversarial.json

New Upgrade Coverage

  • Added scenarios/upgrade-from-day-ago.json
  • Added scenarios/upgrade-from-week-ago.json
  • Added scenarios/upgrade-from-month-ago.json
  • Added support/resolve-openclaw-release-age.mjs
  • Added support/run-openclaw-release-age-upgrade.mjs
  • Updated docs/AGENT_USAGE.md with rolling upgrade usage
  • Added self-check coverage for rolling upgrade resolver/profile planning

Unsafe Legacy Plugin Memory Test

  • Added scenarios/plugin-legacy-unsafe-memory.json
  • Added surfaces/plugin-legacy-unsafe-memory.json
  • Added fixture plugin:
    • support/plugins/kova-legacy-unsafe-memory/index.js
    • support/plugins/kova-legacy-unsafe-memory/openclaw.plugin.json
    • support/plugins/kova-legacy-unsafe-memory/package.json
  • Added support/assert-command-output.mjs
  • Updated src/evaluator.mjs to count failed during register as plugin load failure evidence

Dirty Plugin Testing

  • Added docs/DIRTY_PLUGIN_TESTING_PLAN.md
  • Added scenarios/dirty-plugin-state.json
  • Added dirty plugin states:
    • states/dirty-plugin-local-edits.json
    • states/dirty-plugin-stale-deps.json
    • states/dirty-plugin-manifest-drift.json
    • states/dirty-plugin-disabled-broken.json
    • states/dirty-plugin-symlink-dev.json
    • states/dirty-plugin-partial-install.json
    • states/update-recovery-plugin-user.json
  • Added support/dirty-plugin-state.mjs
  • Added surfaces/dirty-plugin-state.json

Release Update Recovery

  • Added docs/RELEASE_UPDATE_RECOVERY_PLAN.md
  • Added scenarios/release-update-recovery.json
  • Added surfaces/release-update-recovery.json
  • Added support/restore-first-ocm-upgrade-snapshot.mjs

Tool Runtime Matrix

  • Added docs/TOOL_RUNTIME_MATRIX_PLAN.md
  • Added scenarios:
    • scenarios/cron-runtime.json
    • scenarios/exec-tool-safety.json
    • scenarios/mcp-tool-call.json
    • scenarios/tool-failure-containment.json
  • Added states:
    • states/cron-user.json
    • states/exec-tool-user.json
    • states/mcp-tool-user.json
  • Added surfaces:
    • surfaces/cron-runtime.json
    • surfaces/exec-tool-safety.json
    • surfaces/mcp-tool-call.json
    • surfaces/tool-failure-containment.json
  • Added process roles:
    • process-roles/cron-runtime.json
    • process-roles/tool-runtime.json
  • Added helpers:
    • support/run-cron-runtime-smoke.mjs
    • support/run-exec-tool-safety.mjs
    • support/mcp-tool-call-smoke.mjs

Provider/Network Failure Coverage

  • Added docs/NETWORK_ISOLATION_PLAN.md
  • Added src/network-frontage.mjs
  • Added support/network-frontage-proxy.mjs
  • Added provider scenarios:
    • scenarios/agent-provider-protocol-failure.json
    • scenarios/agent-provider-random-disconnect.json
  • Updated support/mock-openai-server.mjs
  • Updated support/configure-openclaw-mock-auth.mjs
  • Updated src/commands/run.mjs, src/commands/matrix-run.mjs, src/run/context.mjs, src/run/phase-plan.mjs

Adversarial Input Coverage

  • Added scenarios/adversarial-input-openai-compatible.json
  • Added surfaces/adversarial-input.json
  • Added support/run-adversarial-inputs.mjs
  • Added profiles/adversarial.json

Plugin Fixture/Manifest Fixes

  • Added support/plugins/kova-basic/openclaw.plugin.json
  • Added support/plugins/kova-missing-runtime-dep/openclaw.plugin.json
  • Updated scenarios/plugin-missing-runtime-deps.json

Resource Attribution / Evaluation / Reporting Fixes

  • Updated src/collectors/resources.mjs
  • Updated src/evaluation/violations.mjs
  • Updated src/evidence/agent-turns.mjs
  • Updated src/evidence/shared.mjs
  • Updated src/measurement-contract.mjs
  • Updated src/reporting/report.mjs
  • Updated src/reporting/scenario-aggregate.mjs
  • Updated src/run/command-executor.mjs
  • Updated src/run/report-finalization.mjs
  • Updated src/runner.mjs
  • Updated src/safety.mjs
  • Updated src/selfcheck.mjs

Large Session Fixture

  • Added support/prepare-large-memory-session-state.mjs
  • Updated related surface thresholds/metadata:
    • surfaces/fresh-install.json
    • surfaces/soak.json
    • surfaces/gateway-performance.json
    • surfaces/workspace-scan.json

OpenAI-Compatible / Runtime Role Tweaks

  • Updated scenarios/openai-compatible-turn.json
  • Updated support/run-openai-compatible-turn.mjs
  • Updated process-roles/openai-compatible-client.json
  • Updated role primary-resource metadata across several surfaces

Docs / User-Facing Metadata

  • Updated README.md
  • Updated docs/WHAT_IS_KOVA.md
  • Updated docs/AGENT_USAGE.md
  • Updated metrics/known.json

Git Hygiene

  • Updated .gitignore
    • ignores .env, .env.*, local JSON/env files, and checkout contents
    • keeps .env.example and checkouts/.gitkeep
  • Added checkouts/.gitkeep

Tests / Snapshots

  • Added checked-in report fixtures:
    • tests/fixtures/reports/pass.json
    • tests/fixtures/reports/fail.json
  • Updated tests/render-snapshots.mjs
  • Refreshed all affected snapshots under tests/snapshots/

Validation Already Run

  • node bin/kova.mjs self-check --json
  • npm run test:snapshots
  • git diff --check
  • Real disposable run for plugin-legacy-unsafe-memory passed against runtime:stable

clawdius added 3 commits May 20, 2026 22:31
# Conflicts:
#	src/cli.mjs
#	src/commands/matrix-run.mjs
#	src/commands/report.mjs
#	src/commands/run.mjs
#	src/evaluator.mjs
#	src/runner.mjs
#	src/selfcheck.mjs
#	states/large-memory-session.json
@clawsweeper clawsweeper Bot added rating: πŸ§‚ unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: πŸ“£ needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. labels May 21, 2026
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 21, 2026

Codex review: needs maintainer review before merge.

Latest ClawSweeper review: 2026-05-23 04:07 UTC / May 23, 2026, 12:07 AM ET.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR expands Kova scenario/profile coverage, evaluator gates, report fixtures, and helpers for rolling upgrades, dirty plugins, tool runtime, provider/network failure, adversarial inputs, and network frontage.

Reproducibility: not applicable. This is a feature PR expanding Kova validation coverage and harness controls, not a bug report with a current-main reproduction path.

PR rating
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Summary: Strong real-run proof supports a broad but automation-sensitive Kova harness patch with no discrete blocking code finding from this review.

Rank-up moves:

  • Update the branch against current main to clear the current merge conflict.
  • Refresh self-check, snapshot, and targeted real-run proof on the rebased head.
What the crustacean ranks mean
  • πŸ¦€ challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • πŸ¦ͺ silver shellfish: thin signal; proof, validation, or implementation needs work.
  • πŸ§‚ unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (terminal): The PR body and comments provide copied terminal/log proof from current-head self-checks, snapshot tests, targeted real Kova runs, an exhaustive local-build matrix run, artifact paths, and cleanup confirmation.

Risk before merge

  • The PR cannot merge cleanly until it is updated against current main.
  • Merging this PR changes Kova release/exhaustive profiles, evaluator gates, receipts, and snapshots, so release-lab automation may surface new failures even when OpenClaw behavior has not changed.
  • The submitted exhaustive run still reports many OpenClaw product/resource failures; maintainers need to accept that these become visible Kova signals rather than harness blockers.
  • The opt-in loopback frontage path depends on host loopback alias privileges on macOS, and the discussion shows one live attempt blocked on local ifconfig permissions.

Maintainer options:

  1. Update the branch and refresh proof (recommended)
    Resolve the current default-branch conflicts and rerun the Kova self-check, snapshot test, and at least the targeted real-run proof for the changed harness paths.
  2. Accept the expanded Kova gate behavior
    Maintainers can choose to merge after branch update if they want release/exhaustive profiles to expose the additional OpenClaw product/resource failures as Kova signals.
  3. Split the scenario families
    Pause this PR and ask for smaller PRs if rolling upgrades, dirty plugins, network frontage, and tool-runtime gates need separate review and calibration.

Next step before merge
This is not a safe repair-lane item because the PR is broad, currently conflicting, and needs maintainer judgment about Kova gate calibration before merge.

Security
Cleared: No concrete security or supply-chain regression found; the PR does not change CI, dependencies, or lockfiles, and its new helpers operate in disposable Kova/OCM environments with controlled fixtures and redaction paths.

Review details

Best possible solution:

Update the branch against current main, preserve the proof-backed scenario families, and land only after maintainers accept the broader Kova gate/profile behavior or choose to split it by scenario family.

Do we have a high-confidence way to reproduce the issue?

Not applicable. This is a feature PR expanding Kova validation coverage and harness controls, not a bug report with a current-main reproduction path.

Is this the best way to solve the issue?

Unclear. The implementation appears coherent and proof-backed, but the best landing shape depends on maintainer acceptance of broader Kova gate behavior and must be reassessed after conflict resolution.

Label changes:

  • add merge-risk: 🚨 compatibility: Existing users of Kova release/exhaustive profiles may see changed profile contents, thresholds, report shape, and failure outcomes after merge.

Label justifications:

  • P2: This is a broad but normal-priority Kova harness improvement with limited blast radius outside validation workflows.
  • merge-risk: 🚨 compatibility: Existing users of Kova release/exhaustive profiles may see changed profile contents, thresholds, report shape, and failure outcomes after merge.
  • merge-risk: 🚨 automation: The diff changes Kova profile execution, evaluator gates, receipts, snapshots, and report artifacts that validation automation consumes.
  • rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🦞 diamond lobster, patch quality is 🐚 platinum hermit, and Strong real-run proof supports a broad but automation-sensitive Kova harness patch with no discrete blocking code finding from this review.
  • feature: ✨ showcase: ClawSweeper spotlight: unusually compelling feature idea for maintainer attention. The idea substantially broadens Kova into upgrade, dirty-plugin, tool-runtime, adversarial, and network-isolation coverage that can expose release-quality OpenClaw signals.
  • status: πŸ‘€ ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body and comments provide copied terminal/log proof from current-head self-checks, snapshot tests, targeted real Kova runs, an exhaustive local-build matrix run, artifact paths, and cleanup confirmation.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body and comments provide copied terminal/log proof from current-head self-checks, snapshot tests, targeted real Kova runs, an exhaustive local-build matrix run, artifact paths, and cleanup confirmation.

What I checked:

  • PR branch is currently conflicting: GitHub reports mergeStateStatus: DIRTY and mergeable: CONFLICTING for the PR head against the default branch. (1c1c0d00dfb5)
  • Expanded exhaustive profile coverage: The PR adds rolling upgrade entries and new plugin/tool/adversarial scenario families to the exhaustive profile, so it changes the release-lab execution surface rather than a narrow fix already present on main. (profiles/exhaustive.json:66, 1c1c0d00dfb5)
  • Evaluator now gates new evidence families: The PR collects cron, exec-tool, MCP tool-call, dirty-plugin, and release-recovery evidence, then fail-closes active exec/MCP/plugin/upgrade thresholds when evidence is missing or failing. (src/evaluator.mjs:464, 1c1c0d00dfb5)
  • Network frontage adds opt-in automation behavior: The PR introduces --network-frontage controls and starts a loopback-frontage proxy after gateway service metadata is available, with explicit cleanup on startup failure. (src/network-frontage.mjs:31, 1c1c0d00dfb5)
  • Exec proof path uses controlled sentinel evidence: The exec helper creates a disposable sentinel, sends safe, blocked, large-output, and timeout turns through OpenClaw, then records process snapshots and leak evidence. (support/run-exec-tool-safety.mjs:70, 1c1c0d00dfb5)
  • Current-main history for central evaluator and runner: Current main’s evaluator and runner core paths are primarily traced to Shakker’s refactor: derive channel proof policy, with later related Kova validation fixes also by Shakker. (src/evaluator.mjs:29, 25be50d37561)

Likely related people:

  • Shakker: Blame and recent history point to Shakker for the current evaluator/runner structure and most recent Kova scenario/evidence changes on main. (role: recent area contributor; confidence: high; commits: 25be50d37561, de92cfc8897f, 7545a32961cd; files: src/evaluator.mjs, src/runner.mjs, src/selfcheck.mjs)
  • Peter Steinberger: Recent main history includes Peter Steinberger fixes in the mock-provider/self-check validation area that this PR extends, but the central branch surface is mostly owned by later Shakker changes. (role: adjacent contributor; confidence: medium; commits: 2fb5a386d637, 0833bed2befa, b009c1d3f592; files: src/selfcheck.mjs, support/mock-openai-server.mjs, scenarios)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 2ce089890a34.

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 21, 2026

ClawSweeper PR egg

✨ Hatched: πŸ’Ž rare Brave Signal Puff

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: πŸ‘€ ready for maintainer look, status: πŸš€ automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: πŸ’Ž rare.
Trait: collects tiny proofs.
Image traits: location status garden; accessory green check lantern; palette coral, mint, and warm cream; mood celebratory; pose standing beside its cracked shell; shell glossy opal shell; lighting clean product lighting; background subtle branch markers.
Share on X: post this hatch
Copy: My PR egg hatched a πŸ’Ž rare Brave Signal Puff in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: πŸ₯š common, 🌱 uncommon, πŸ’Ž rare, ✨ glimmer, and 🌈 legendary.

@JuanHuaXu
Copy link
Copy Markdown
Author

Remediation pushed in ad6dcbd.

What changed:

  • P1: exec-tool-safety now uses the real OpenClaw agent/provider path. The mock provider emits Responses function_call items for exec; the helper verifies the safe exec turn, sends a dangerous rm -rf <sentinel> payload through the same path, and proves the sentinel remained.
  • P2: evaluator now collects and gates cron-runtime, exec-tool-safety, and mcp-tool-call helper JSON (cronRunMs, execSafeCommandSucceeded, execDangerousCommandBlocked, execOutputTruncated, mcpToolsCallMs, invalid MCP attribution, process leaks/errors).
  • Added self-check coverage: tool-runtime-evidence-evaluation fails if those helper outputs stop being parsed/enforced.

Validation:

  • npm run check -> PASS, 162/162 checks.
  • npm run test:snapshots -> PASS, 18/18 snapshots.
  • Real disposable Kova run: cron-runtime on runtime:stable -> PASS, run kova-260522-011201-c292fb; evidence: cronRunMs=698, cronRunCompleted=true, cronTriggerAttributed=true.
  • Real disposable Kova run: mcp-tool-call on runtime:stable -> PASS, run kova-260522-011137-720f05; evidence: mcpToolsCallMs=184, safeToolSucceeded=true, invalidToolErrorAttributed=true.
  • Real disposable Kova run: exec-tool-safety on runtime:stable -> Kova tool checks PASS but scenario verdict FAIL due to product RSS threshold; run kova-260522-011101-dd863a; evidence: safeCommandSucceeded=true, dangerousCommandBlocked=true, dangerousPayloadExecuted=false, outputTruncated=true, timeoutMs=1006, processLeaks=0. Remaining violation is OpenClaw/product RSS: tool-runtime peak RSS 694.8 MB > 500 MB.

So the reviewer-reported Kova false evidence paths are patched. The only failure observed in the exec proof run is now a real product resource signal, not Kova misidentifying its own helper behavior.

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: πŸ§‚ unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: πŸ“£ needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. labels May 22, 2026
@JuanHuaXu
Copy link
Copy Markdown
Author

Follow-up remediation pushed in 646b744 for the latest P2/P3 findings.

Fixes:

  • P2 network frontage: waitForTcp() now only checks child exit state when a child process is actually passed, so validation probes without a child are allowed.
  • P2 cron gates: evaluator now enforces cronRunCompleted and cronTriggerAttributed boolean thresholds. Added a negative self-check where cronTriggerAttributed=false must fail.
  • P3 MCP metric naming: evaluator now reports/violates mcpToolCallErrorAttributed, matching the surface/profile/known metric id.

Validation:

  • node bin/kova.mjs self-check --json -> PASS (ok: true), including new network-frontage-no-child-tcp and negative cron attribution coverage.
  • npm run test:snapshots -> PASS, 18/18.
  • Reviewer acceptance command was attempted: node bin/kova.mjs run --target runtime:stable --scenario fresh-install --network-frontage loopback --worker-id 7 --execute --json.
    • Result: BLOCKED by local macOS privilege, not the fixed no-child validation bug.
    • Run id: kova-260522-074939-8e800b.
    • Blocker: ifconfig: ioctl (SIOCAIFADDR): permission denied while adding 127.0.1.17 alias.
    • Cleanup verified: ocm env list -> No environments.

So the code-level review blockers are patched. The live loopback command reaches the expected alias setup path here, but this Codex session cannot grant elevated ifconfig lo0 alias permissions.

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 22, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 22, 2026

πŸ¦žπŸ‘€
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

Re-review progress:

@JuanHuaXu
Copy link
Copy Markdown
Author

Follow-up remediation pushed in 30c2d69 for the latest P2 metric-contract findings.

Fixes:

  • P2 dirty-plugin gates: evaluator now collects kova.dirtyPluginState.v1 fixture output and plugin command results, then enforces dirtyPluginDetected, dirtyPluginReported, dirtyPluginChecksumPreserved, doctorDestructiveChangeCount, pluginsUsableWithDirtyState, and gatewaySurvivedDirtyPlugin. Missing evidence now fails active dirty-plugin thresholds instead of silently passing.
  • P2 release recovery gates: evaluator now derives/enforces updateRetryVersionDrift, rollbackAvailable, rollbackSucceeded, pluginsUsableAfterUpgrade, pluginsUsableAfterRollback, and rollbackPreservedPluginData from upgrade/retry version output, rollback restore output, plugin-health commands, rollback plugin commands, and post-rollback dirty fixture verification. Missing evidence now fails active release-recovery thresholds.
  • Added plugin-recovery-evidence-evaluation self-check coverage with negative cases for missing dirty reporting, checksum/destructive doctor failure, retry version drift, missing/failed rollback, and post-rollback plugin unusability.

Validation:

  • node bin/kova.mjs self-check --json -> PASS (ok: true), including the new plugin recovery evidence check.
  • npm run test:snapshots -> PASS, 18/18.
  • git diff --check -> PASS.

This addresses the reviewer concern by making the advertised dirty-plugin and release-update-recovery surface thresholds executable gates rather than planning-only metric names.

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 22, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@JuanHuaXu
Copy link
Copy Markdown
Author

Follow-up remediation pushed in 8e7ffa4 for the latest P2/P3 findings.

Fixes:

  • P2 network frontage: partial loopback allocation is now registered immediately after alias creation, before proxy startup, so stopNetworkFrontage() can clean created aliases if proxy startup/validation fails.
  • P2 exec evidence: active exec thresholds now fail closed on missing/null helper evidence using required gates for execSafeCommandMs, execTimeoutMs, execSafeCommandSucceeded, execDangerousCommandBlocked, execOutputTruncated, and execProcessLeaks.
  • P3 README inventory: refreshed counts to 56 scenarios / 37 surfaces / 37 states / 10 profiles from node bin/kova.mjs plan --json.

Validation:

  • node bin/kova.mjs self-check --json -> PASS (ok: true), including new partial network frontage invariant and missing/incomplete exec evidence checks.
  • npm run test:snapshots -> PASS, 18/18.
  • git diff --check -> PASS.

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 22, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added status: πŸ” re-review loop A fresh ClawSweeper review was explicitly requested after the latest review. and removed status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 22, 2026
@clawsweeper clawsweeper Bot added status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed status: πŸ“£ needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 22, 2026
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 22, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: πŸ¦ͺ silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: πŸ“£ needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed proof: sufficient Contributor real behavior proof is sufficient. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 22, 2026
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 22, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: πŸ¦ͺ silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: πŸ“£ needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 22, 2026
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 22, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed proof: sufficient Contributor real behavior proof is sufficient. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 22, 2026
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 22, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. feature: ✨ showcase ClawSweeper spotlight: unusually compelling feature idea for maintainer attention. status: πŸ‘€ ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature: ✨ showcase ClawSweeper spotlight: unusually compelling feature idea for maintainer attention. merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. P2 Normal priority bug or improvement with limited blast radius. proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: πŸ‘€ ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant