[audit-workflows] Daily Workflow Audit — 2026-05-22: Codex 3rd-day fail + 9x smoke error surge #34115

2026-05-22T21:49:24Z

github-actions[bot]
Bot May 22, 2026

Daily Workflow Audit — 2026-05-22

Audited 78 runs from the last 24 hours (6.1h total duration, $21.75 spend, 42.8M tokens). Success rate 78.2% — a slight rebound from yesterday's 75.6% but still well below the 85–92% baseline from earlier in the week. Two issues dominate today's findings: the Codex engine has now failed 100% for three consecutive days with no fix landed, and smoke test errors jumped 9x (63 vs 7 yesterday) in a single 5-minute window.

Summary

Metric	Value	vs Yesterday
Runs	78	+37
Success rate	78.2%	+2.6pp
Failures (error_count > 0)	14	+10
Cost	$21.75	+$1.79
Tokens	42.8M	+6.9M
Errors	63	+56
Firewall blocks	514/2817 (18%)	similar

Engine mix: copilot 42, claude 11, codex 3, null 22 (smoke-test harness runs without engine attribution).

Critical Issues

🔴 Codex engine: 100% failure for 3rd consecutive day

All 3 codex runs failed: AI Moderator (×2, runs §26312070679, §26312080745) and Daily Cache Strategy Analyzer (§26306461300).
Root cause: codex-cli POSTs model=gpt-5.5 to api-proxy:10000/responses, the proxy returns HTTP 400 invalid_request_error, codex-harness exhausts all 3 retries (exitCode=1).
The api-proxy /v1/models advertises gpt-5.5 as available — so the failure is likely a request-body / wire-format mismatch on the /responses endpoint, not a missing model.
Yesterday's recommendation to pin to a known-good model has not yet shipped.

🔴 Smoke test error surge — 9x increase

63 errors today vs 7 yesterday. Concentrated in 14 smoke runs all started in a 5-minute window (~17:59 UTC) — strongly suggests a single environment change or regression at the smoke-batch trigger.
Worst offenders: Smoke Claude 6 errors (§26303796120), Smoke Gemini 5 (§26303796069), Smoke Copilot ARM64 5 (§26303796192), Smoke Agent variants 3 each, Smoke Project 3, Smoke CI 3.

Trend Charts

Success rate has now declined four audit cycles in a row (92.8% → 88.9% → 85.9% → 75.6% → 78.2%), with raw failure counts climbing from 6 to 14. Today's mild rebound is driven by sheer run volume rather than fewer failures — absolute failure count is at a weekly high.

Cost dipped from yesterday's peak driven by recovery of the Daily Safe Output Tool Optimizer ($8.54 → $2.67), but Daily Code Metrics jumped from $3.94 to $6.46 (+64%) and the self-triggered Failure Investigator burned $4.88. Token usage rose 19% while cost held flat.

Cost Breakdown — Top 5 Workflows

Workflow	Cost	Tokens	Duration	Engine
Daily Code Metrics and Trend Tracking Agent	$6.46	30.8M	19.6m	claude
[aw] Failure Investigator (6h)	$4.88	23.7M	21.6m	claude
Daily Safe Output Tool Optimizer	$2.67	9.7M	11.9m	claude
Copilot Agent PR Analysis	$1.65	6.5M	7.2m	claude
Daily Team Evolution Insights	$1.45	5.2M	6.9m	claude

Top 5 cost a combined $17.11 = 79% of daily spend, all on the Claude engine.

Network Friction — Top Blockers

Workflow	Blocked	Total	%
Chaos PR Bundle Fuzzer	54	187	29%
Contribution Check	34	115	30%
Daily Project Performance Summary	30	93	32%
Copilot PR Prompt Pattern Analysis	19	53	36%
PR Sous Chef	19	61	31%

Copilot PR Prompt Pattern Analysis has the highest block percentage and was flagged by observability as a "network friction hotspot."

Execution Drift — Linter Miner

Observability flagged Linter Miner as the top drift signal today:

3 runs, turn counts ranging 0 to 89 (avg 29.7)
One run consumed 18.3 minutes (§26305648141)
Two errored at ~3.5 minutes each (§26305185997, §26303847379)

This eclipses yesterday's PR Sous Chef drift (which was 0-18 turns) — suggests instability propagating to a new workflow.

Workflow Failure Counts (1+ errors)

Workflow	Failed/Total	Notes
AI Moderator	2/2	codex gpt-5.5 issue
Linter Miner	2/3	drift + errors
Matt Pocock Skills Reviewer	1/3
PR Code Quality Reviewer	1/3
Smoke Claude	1/1	6 errors
Smoke Gemini	1/1	5 errors
Smoke Copilot ARM64	1/1	5 errors
Smoke CI	3/11	persistent
Smoke Agent (3 variants)	3/3
Smoke Project, Smoke Update Cross-Repo PR	1/1 each
Agent Container Smoke Test, Code Refiner, Smoke OpenCode, Smoke Call Workflow, Smoke Agent: all/merged, Smoke Create Cross-Repo PR	1/1 each
Changeset Generator, Daily Cache Strategy Analyzer, Daily Secrets Analysis Agent, Daily Testify Uber Super Expert, PR Description Updater, PR Sous Chef (1/6)	1 each	mixed causes

Recommendations

Urgent — Fix codex engine: Pin AI Moderator / Daily Cache Strategy Analyzer / Smoke Codex to a model that actually works against the /responses endpoint (try gpt-5-codex or gpt-5). Verify request body shape matches what the proxy expects. This is now a 3-day-old regression with no progress.
Urgent — Investigate smoke surge: All 14 erroring smoke runs started in the same 5-minute window. Check what shipped to main / the test image between 17:00–18:00 UTC. Start with Smoke Claude run §26303796120 (highest at 6 errors).
Investigate Daily Code Metrics cost growth: $6.46 (+64% day-over-day) at 30.8M tokens / 19.6 minutes / 53 turns. Inspect tool-call sequence for loops or expanded scope.
Stabilize Linter Miner: 0-89 turn variance with 2/3 errors. Add a turn budget or guard against runaway loops.
Audit Copilot PR Prompt Pattern Analysis allowed-domains: 36% block rate (19/53) — either tighten the workflow or extend the allowlist for legitimate calls.
Carry-over: Smoke CI continues intermittent errors (3/11 today); Chaos PR Fuzzer 54 blocks (likely by design — verify).

Resolved / Improved

✅ Daily Safe Output Tool Optimizer cost recovered ($8.54 → $2.67)
✅ Auto-Triage Issues fully recovered (2/2 success today)
✅ PR Sous Chef drift no longer top signal (eclipsed by Linter Miner)

References:

Generated by 🔍 Agentic Workflow Audit Agent · ● 16M · ◷

expires on May 23, 2026, 9:49 PM UTC

2026-05-22T22:30:46Z

github-actions[bot]
Bot May 22, 2026
Author

💥 WHOOSH! 🦸 The Smoke-Test Agent zoomed through this discussion at the speed of CI!

KAPOW! All systems nominal — Claude engine running smooth as a freshly oiled cape!

ZAP! Just leaving my comic-book calling card — the smoke-test bot was here! 🎯💨

— Captain Smoke-Claude, Run §26314423415

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

accounts.google.com
android.clients.google.com
clients2.google.com
contentautofill.googleapis.com
safebrowsingohttpgateway.googleapis.com
www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

💥 [THE END] — Illustrated by Smoke Claude · ● 7.6M · ◷

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[audit-workflows] Daily Workflow Audit — 2026-05-22: Codex 3rd-day fail + 9x smoke error surge #34115

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[audit-workflows] Daily Workflow Audit — 2026-05-22: Codex 3rd-day fail + 9x smoke error surge #34115

Uh oh!

github-actions[bot] Bot May 22, 2026

Daily Workflow Audit — 2026-05-22

Summary

Critical Issues

🔴 Codex engine: 100% failure for 3rd consecutive day

🔴 Smoke test error surge — 9x increase

Trend Charts

Recommendations

Resolved / Improved

Replies: 1 comment

Uh oh!

github-actions[bot] Bot May 22, 2026 Author

github-actions[bot]
Bot May 22, 2026

github-actions[bot]
Bot May 22, 2026
Author