[audit-workflows] Daily Workflow Audit — 2026-05-22: Codex 3rd-day fail + 9x smoke error surge #34115
Replies: 1 comment
-
|
💥 WHOOSH! 🦸 The Smoke-Test Agent zoomed through this discussion at the speed of CI! KAPOW! All systems nominal — Claude engine running smooth as a freshly oiled cape! ZAP! Just leaving my comic-book calling card — the smoke-test bot was here! 🎯💨 — Captain Smoke-Claude, Run §26314423415 Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Daily Workflow Audit — 2026-05-22
Audited 78 runs from the last 24 hours (6.1h total duration, $21.75 spend, 42.8M tokens). Success rate 78.2% — a slight rebound from yesterday's 75.6% but still well below the 85–92% baseline from earlier in the week. Two issues dominate today's findings: the Codex engine has now failed 100% for three consecutive days with no fix landed, and smoke test errors jumped 9x (63 vs 7 yesterday) in a single 5-minute window.
Summary
Engine mix: copilot 42, claude 11, codex 3, null 22 (smoke-test harness runs without engine attribution).
Critical Issues
🔴 Codex engine: 100% failure for 3rd consecutive day
AI Moderator(×2, runs §26312070679, §26312080745) andDaily Cache Strategy Analyzer(§26306461300).model=gpt-5.5toapi-proxy:10000/responses, the proxy returns HTTP 400 invalid_request_error, codex-harness exhausts all 3 retries (exitCode=1)./v1/modelsadvertisesgpt-5.5as available — so the failure is likely a request-body / wire-format mismatch on the/responsesendpoint, not a missing model.🔴 Smoke test error surge — 9x increase
Smoke Claude6 errors (§26303796120),Smoke Gemini5 (§26303796069),Smoke Copilot ARM645 (§26303796192),Smoke Agentvariants 3 each,Smoke Project3,Smoke CI3.Trend Charts
Success rate has now declined four audit cycles in a row (92.8% → 88.9% → 85.9% → 75.6% → 78.2%), with raw failure counts climbing from 6 to 14. Today's mild rebound is driven by sheer run volume rather than fewer failures — absolute failure count is at a weekly high.
Cost dipped from yesterday's peak driven by recovery of the Daily Safe Output Tool Optimizer ($8.54 → $2.67), but Daily Code Metrics jumped from $3.94 to $6.46 (+64%) and the self-triggered Failure Investigator burned $4.88. Token usage rose 19% while cost held flat.
Cost Breakdown — Top 5 Workflows
Top 5 cost a combined $17.11 = 79% of daily spend, all on the Claude engine.
Network Friction — Top Blockers
Copilot PR Prompt Pattern Analysis has the highest block percentage and was flagged by observability as a "network friction hotspot."
Execution Drift — Linter Miner
Observability flagged Linter Miner as the top drift signal today:
This eclipses yesterday's PR Sous Chef drift (which was 0-18 turns) — suggests instability propagating to a new workflow.
Workflow Failure Counts (1+ errors)
Recommendations
/responsesendpoint (trygpt-5-codexorgpt-5). Verify request body shape matches what the proxy expects. This is now a 3-day-old regression with no progress.Resolved / Improved
References:
Beta Was this translation helpful? Give feedback.
All reactions