Codex/kova scenarios and fixes#2
Conversation
# Conflicts: # src/cli.mjs # src/commands/matrix-run.mjs # src/commands/report.mjs # src/commands/run.mjs # src/evaluator.mjs # src/runner.mjs # src/selfcheck.mjs # states/large-memory-session.json
|
Codex review: needs maintainer review before merge. Latest ClawSweeper review: 2026-05-23 04:07 UTC / May 23, 2026, 12:07 AM ET. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: not applicable. This is a feature PR expanding Kova validation coverage and harness controls, not a bug report with a current-main reproduction path. PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Update the branch against current main, preserve the proof-backed scenario families, and land only after maintainers accept the broader Kova gate/profile behavior or choose to split it by scenario family. Do we have a high-confidence way to reproduce the issue? Not applicable. This is a feature PR expanding Kova validation coverage and harness controls, not a bug report with a current-main reproduction path. Is this the best way to solve the issue? Unclear. The implementation appears coherent and proof-backed, but the best landing shape depends on maintainer acceptance of broader Kova gate behavior and must be reassessed after conflict resolution. Label changes:
Label justifications:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 2ce089890a34. |
|
ClawSweeper PR egg β¨ Hatched: π rare Brave Signal Puff Hatch commandComment Hatchability rules:
Rarity: π rare. What is this egg doing here?
|
|
Remediation pushed in What changed:
Validation:
So the reviewer-reported Kova false evidence paths are patched. The only failure observed in the exec proof run is now a real product resource signal, not Kova misidentifying its own helper behavior. |
|
Follow-up remediation pushed in Fixes:
Validation:
So the code-level review blockers are patched. The live loopback command reaches the expected alias setup path here, but this Codex session cannot grant elevated |
|
π¦π§Ή I asked ClawSweeper to review this item again. |
|
π¦π Command router queued. I will update this comment with the next step. Re-review progress:
|
|
Follow-up remediation pushed in Fixes:
Validation:
This addresses the reviewer concern by making the advertised dirty-plugin and release-update-recovery surface thresholds executable gates rather than planning-only metric names. |
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
|
Follow-up remediation pushed in Fixes:
Validation:
|
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
Fresh Evidence
Latest Kova head:
1c1c0d0(codex/kova-scenarios-and-fixes).Current-Head Exec Leak Measurement Proof
Reviewer P2 leak-measurement fix was run at Kova head
1c1c0d0(Measure exec process leaks in tool scenarios). The helper now snapshots processes before and after the inner OpenClaw exec-tool turns, writes before/after/leak artifacts, diffs scoped exec/agent/gateway-tree/tool-runtime processes, and reports the measured leak count instead of a hard-coded zero. Gateway root restarts are excluded from the exec-child leak count so the gate tracks leaked tool children rather than service PID churn.Validation:
node bin/kova.mjs self-check --json->ok: true; synthetic leaked exec evidence now fails onexecProcessLeaks.node tests/render-snapshots.mjs-> 18 pass.git diff --check-> pass.exec-tool-safetyreal runkova-260522-172726-8691d6:execProcessLeaks: 0; leak artifactleakCount: 0; remaining failure is only OpenClawtool-runtime peak RSS 760.4 MB > 500 MB.tool-failure-containmentreal runkova-260522-172756-1d1ef1:execProcessLeaks: 0; leak artifactleakCount: 0; remaining failure is only OpenClawtool-runtime peak RSS 757.9 MB > 500 MB.ocm env list-> no environments.Current-Head Exec Tool Evidence Proof
Reviewer P2 fix proof was run at Kova head
1546804(Drive exec containment checks through OpenClaw). This fixes the false-evidence lane by using the OpenClaw exec tool schema's requiredcommandargument and by driving safe, blocked, oversized-output, and timeout cases throughopenclaw agentplus mock-provider tool-result evidence.Commands:
KOVA_HOME=checkouts/p2-exec-real-proof2-kova-home \ node bin/kova.mjs run \ --target runtime:stable \ --scenario exec-tool-safety \ --state exec-tool-user \ --execute \ --report-dir checkouts/p2-exec-real-proof2/reports \ --json KOVA_HOME=checkouts/p2-exec-real-proof2-kova-home \ node bin/kova.mjs run \ --target runtime:stable \ --scenario tool-failure-containment \ --state exec-tool-user \ --execute \ --report-dir checkouts/p2-exec-real-proof2/reports \ --jsonResults:
exec-tool-safety:kova-260522-170514-192a6a, proof complete10/10; exec evidence available;safeCommandSucceeded: true;dangerousCommandBlocked: true;dangerousPayloadExecuted: false;outputTruncated: true;timeoutMs: 3499;processLeaks: 0.tool-failure-containment:kova-260522-170551-6419fb, proof complete10/10; exec evidence available;dangerousCommandBlocked: true;dangerousPayloadExecuted: false;outputTruncated: true;timeoutMs: 3291;processLeaks: 0.Both runs still report
FAIL, but only for the existing OpenClaw resource threshold:tool-runtimeRSS around 750 MB over the 500 MB threshold. The previous Kova evidence failures are gone.Validation for the patch:
node bin/kova.mjs self-check --json->ok: true(173 checks)node tests/render-snapshots.mjs-> 18 passgit diff --check-> passocm env list-> no environmentsArtifacts on runner:
checkouts/p2-exec-real-proof2/reports/kova-260522-170514-192a6a.jsoncheckouts/p2-exec-real-proof2/reports/kova-260522-170551-6419fb.jsonCurrent-Head Tool-Failure Proof
Reviewer P2 fix proof was run at Kova head
2439672(Fix failure-only exec mock provider flow).Command:
KOVA_HOME=checkouts/p2-tool-failure-proof-kova-home \ node bin/kova.mjs run \ --target runtime:stable \ --scenario tool-failure-containment \ --state exec-tool-user \ --execute \ --report-dir checkouts/p2-tool-failure-proof/reports \ --jsonResult:
kova-260522-164448-f2fbb2completed with proof completeness10/10required obligations, cleanup destroyed the disposable env, and the fixed failure-only provider path was exercised:kova-exec-tool-failure-onlykova-exec-tool-failure-only-dangerous-tool-callkova-exec-tool-failure-only-dangerous-tool-callexectool calldangerousCommandBlocked: truedangerousPayloadExecuted: falsedangerousSentinelStillPresent: trueoutputTruncated: true,timeoutObserved: true,processLeaks: 0The scenario verdict is still
FAIL, but for an OpenClaw resource threshold, not the Kova wiring bug:tool-runtime peak RSS 741.9 MB exceeded threshold 500 MB. This is expected to remain a product/resource signal for maintainers.Artifacts on runner:
checkouts/p2-tool-failure-proof/reports/kova-260522-164448-f2fbb2.mdcheckouts/p2-tool-failure-proof/reports/kova-260522-164448-f2fbb2.jsoncheckouts/p2-tool-failure-proof-kova-home/artifacts/kova-260522-164448-f2fbb2/kova-tool-failure-containmen-81131346-kova-260522-164448-f2fbb2/tool-failure-containment.jsonRequested OpenClaw target:
v2026.5.21-beta.1. The Git tag exists and resolves to89a17def chore(release): prepare 2026.5.21-beta.1, but[email protected]is not currently published in npm/OCM release discovery. A directnpm:2026.5.21-beta.1exhaustive run therefore blocked at provisioning withOpenClaw release version "2026.5.21-beta.1" was not found.Fresh release-shaped tag evidence was run from a disposable
openclaw/openclaw.gitcheckout at that tag usinglocal-build:checkouts/openclaw-v2026.5.21-beta.1. Afterpnpm install --frozen-lockfile, directpnpm packsucceeded and producedopenclaw-2026.5.21-beta.1.tgz.Matrix command:
KOVA_HOME=checkouts/kova-pr2-evidence-v2026-5-21-beta-1-local-build-rerun \ node bin/kova.mjs matrix run \ --profile exhaustive \ --target local-build:checkouts/openclaw-v2026.5.21-beta.1 \ --source-env kova-pr2-source-v2026-5-21-beta-1 \ --execute \ --allow-exhaustive \ --json \ --report-dir checkouts/kova-pr2-evidence-v2026-5-21-beta-1-local-build-rerun/reportsResult:
kova-260522-145534-037d84->77 total Β· 32 PASS Β· 45 FAIL Β· 0 BLOCKED.Evidence artifacts on runner:
checkouts/kova-pr2-evidence-v2026-5-21-beta-1-local-build-rerun/reports/kova-260522-145534-037d84-exhaustive.mdcheckouts/kova-pr2-evidence-v2026-5-21-beta-1-local-build-rerun/reports/kova-260522-145534-037d84-exhaustive.jsoncheckouts/kova-pr2-evidence-v2026-5-21-beta-1-local-build-rerun/reports/kova-260522-145534-037d84-bundle.tar.gz743ee926c809921c0b6aea0170c1731e8576749ab88ee8126cafebfb210c73aeNotable PASS coverage:
release-runtime-startup,channel-discord-capability-conformance,upgrade-existing-userboth source states,bundled-runtime-depsboth states,plugin-lifecycleall states, official/bad/missing/unsafe plugin lanes,provider-models,agent-cold-warm-message,dashboard-readiness,tui-responsiveness,mcp-runtime-start-stop,agent-network-offline,failure-injection, andcross-platform-smoke.Top remaining failures are product/resource signals in OpenClaw/tag behavior rather than Kova unsupported-mode blockers: gateway RSS around the 700 MB threshold across agent/provider/HTTP/TUI surfaces, rolling-upgrade package/runtime RSS/CPU, dirty-plugin
doctor-cliRSS, tool-runtime RSS for exec/tool containment, soak/workspace latency, and a few functional/liveness failures (channel generated-image handoff, Telegram timeout signals, cron/browser/media gateway restarting, MCP tool-call missing runtime role evidence).Cleanup after the run: disposable source env destroyed; old beta runtime records reintroduced by upgrade lanes removed;
ocm runtime listshows onlystable.PR Change List
Branch:
codex/kova-scenarios-and-fixesBase compared:
origin/mainRCA doc is removed from the feature set.
.learnings/is still untracked and not part of PR.Matrix/Profile Wiring
profiles/rolling-upgrade.jsonprofiles/exhaustive.json77profiles/release.json51profiles/adversarial.jsonNew Upgrade Coverage
scenarios/upgrade-from-day-ago.jsonscenarios/upgrade-from-week-ago.jsonscenarios/upgrade-from-month-ago.jsonsupport/resolve-openclaw-release-age.mjssupport/run-openclaw-release-age-upgrade.mjsdocs/AGENT_USAGE.mdwith rolling upgrade usageUnsafe Legacy Plugin Memory Test
scenarios/plugin-legacy-unsafe-memory.jsonsurfaces/plugin-legacy-unsafe-memory.jsonsupport/plugins/kova-legacy-unsafe-memory/index.jssupport/plugins/kova-legacy-unsafe-memory/openclaw.plugin.jsonsupport/plugins/kova-legacy-unsafe-memory/package.jsonsupport/assert-command-output.mjssrc/evaluator.mjsto countfailed during registeras plugin load failure evidenceDirty Plugin Testing
docs/DIRTY_PLUGIN_TESTING_PLAN.mdscenarios/dirty-plugin-state.jsonstates/dirty-plugin-local-edits.jsonstates/dirty-plugin-stale-deps.jsonstates/dirty-plugin-manifest-drift.jsonstates/dirty-plugin-disabled-broken.jsonstates/dirty-plugin-symlink-dev.jsonstates/dirty-plugin-partial-install.jsonstates/update-recovery-plugin-user.jsonsupport/dirty-plugin-state.mjssurfaces/dirty-plugin-state.jsonRelease Update Recovery
docs/RELEASE_UPDATE_RECOVERY_PLAN.mdscenarios/release-update-recovery.jsonsurfaces/release-update-recovery.jsonsupport/restore-first-ocm-upgrade-snapshot.mjsTool Runtime Matrix
docs/TOOL_RUNTIME_MATRIX_PLAN.mdscenarios/cron-runtime.jsonscenarios/exec-tool-safety.jsonscenarios/mcp-tool-call.jsonscenarios/tool-failure-containment.jsonstates/cron-user.jsonstates/exec-tool-user.jsonstates/mcp-tool-user.jsonsurfaces/cron-runtime.jsonsurfaces/exec-tool-safety.jsonsurfaces/mcp-tool-call.jsonsurfaces/tool-failure-containment.jsonprocess-roles/cron-runtime.jsonprocess-roles/tool-runtime.jsonsupport/run-cron-runtime-smoke.mjssupport/run-exec-tool-safety.mjssupport/mcp-tool-call-smoke.mjsProvider/Network Failure Coverage
docs/NETWORK_ISOLATION_PLAN.mdsrc/network-frontage.mjssupport/network-frontage-proxy.mjsscenarios/agent-provider-protocol-failure.jsonscenarios/agent-provider-random-disconnect.jsonsupport/mock-openai-server.mjssupport/configure-openclaw-mock-auth.mjssrc/commands/run.mjs,src/commands/matrix-run.mjs,src/run/context.mjs,src/run/phase-plan.mjsAdversarial Input Coverage
scenarios/adversarial-input-openai-compatible.jsonsurfaces/adversarial-input.jsonsupport/run-adversarial-inputs.mjsprofiles/adversarial.jsonPlugin Fixture/Manifest Fixes
support/plugins/kova-basic/openclaw.plugin.jsonsupport/plugins/kova-missing-runtime-dep/openclaw.plugin.jsonscenarios/plugin-missing-runtime-deps.jsonResource Attribution / Evaluation / Reporting Fixes
src/collectors/resources.mjssrc/evaluation/violations.mjssrc/evidence/agent-turns.mjssrc/evidence/shared.mjssrc/measurement-contract.mjssrc/reporting/report.mjssrc/reporting/scenario-aggregate.mjssrc/run/command-executor.mjssrc/run/report-finalization.mjssrc/runner.mjssrc/safety.mjssrc/selfcheck.mjsLarge Session Fixture
support/prepare-large-memory-session-state.mjssurfaces/fresh-install.jsonsurfaces/soak.jsonsurfaces/gateway-performance.jsonsurfaces/workspace-scan.jsonOpenAI-Compatible / Runtime Role Tweaks
scenarios/openai-compatible-turn.jsonsupport/run-openai-compatible-turn.mjsprocess-roles/openai-compatible-client.jsonDocs / User-Facing Metadata
README.mddocs/WHAT_IS_KOVA.mddocs/AGENT_USAGE.mdmetrics/known.jsonGit Hygiene
.gitignore.env,.env.*, local JSON/env files, and checkout contents.env.exampleandcheckouts/.gitkeepcheckouts/.gitkeepTests / Snapshots
tests/fixtures/reports/pass.jsontests/fixtures/reports/fail.jsontests/render-snapshots.mjstests/snapshots/Validation Already Run
node bin/kova.mjs self-check --jsonnpm run test:snapshotsgit diff --checkplugin-legacy-unsafe-memorypassed againstruntime:stable