feat(optimize): flag low-worth expensive sessions#247
Merged
Conversation
Adds detectLowWorthSessions to the optimize pipeline. Flags expensive sessions (>=$2 spend; >=$3 if no edit turns at all) with weak delivery signals -- no edits, repeated retries, or edit work that never landed in one shot -- when no git/gh delivery command is observed in the session's bash history. Built on top of the existing #246 dedup pattern. Priority order is low-worth -> context-bloat -> outliers; each later detector excludes sessions already named by an earlier one so a single session is never listed in three findings. Detection model: - $2 floor; $3 floor when the only signal is "no edit turns". - 3 retries to trip the retry reason; 2 retries with edit turns and zero one-shot edits to trip the "no one-shot edit turns" reason. - categoryBreakdown aggregates are preferred when present; falls back to raw turns for older parsed sessions. Delivery-command regex uses (?:\s|$|--) instead of \b after commit/push to avoid false positives like `git commit-tree HEAD^{tree}` and `git commit-graph write` while still matching `git commit --amend`. A `--dry-run` lookahead in the same pipeline segment excludes preview commands. Three impact tiers consistent with detectContextBloat: high at >=10 candidates or >=$50 total candidate spend; low at <=2 candidates AND <$10 total; medium otherwise. Token-savings estimate replaces the original 0.5 magic ratio with two defensible regimes: - No-edit sessions: full session token total (the session produced no apparent output to weigh against the spend). - Sessions with edits but with retries / no one-shot: retry fraction (retries / total turns) of the session token total. Edits may still have been useful; we credit the model with that and only flag the retry overhead. Fix-text differentiated from outlier detector's: low-worth focuses on naming a deliverable up front and capping retry attempts; outliers keeps the existing "tighter constraint" framing. Tests: - New describe block for detectLowWorthSessions (17 tests covering thresholds, reasons, delivery-command detection including commit-tree false-positive guard, dry-run handling, three impact tiers, retry-fraction savings model, full-session no-edit savings). - One new test for detectContextBloat asserting it honors the excludedSessionIds parameter (not just outliers). - Real-data run on 22.6K sessions: top-5 lists are now disjoint across all three per-session detectors, headline savings is 17% of spend (vs 10% pre-PR with just context-bloat dedup, vs 54% if we had used the full-session ceiling for low-worth).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes #241 (cross-fork PR by @ozymandiashh — original intent preserved, this branch was built on current main with the review fixes integrated cleanly into the #246 dedup pattern instead of being layered on top of an older base).
Summary
Adds
detectLowWorthSessionstocodeburn optimize. Flags expensive sessions (≥$2 spend; ≥$3 if no edit turns) with weak delivery signals — no edits, repeated retries, or edit work that never landed in one shot — when nogit/ghdelivery command is observed in the bash history. Framed as review candidates, not proof of waste.Detection model
categoryBreakdownaggregates preferred when present, falls back to raw turnsgit commit,git push,gh pr create,gh pr merge(excluding--dry-runin the same pipeline segment)Review fixes integrated on top of #241's commit
low-worth → context-bloat → outliers.findLowWorthCandidatesandfindContextBloatCandidatesbuild ID sets ahead of detection;detectContextBloatanddetectSessionOutliersaccept anexcludedSessionIdsparam and filter accordingly. Real-data top-5 lists are now disjoint across all three findings.commit-treeregex false positive fixed. Used(?:\s|$|--)aftercommit|pushinstead of\b, sogit commit-tree HEAD^{tree}andgit commit-graph writeare no longer treated as deliveries whilegit commit --amendstill is. New tests cover both cases.retries / totalTurns × tokens, clamped to [0,1]). Edits may still have been useful; we credit the model with that and only flag the retry overhead.Validation
npx vitest run— 38 files, 529 tests pass (was 498 before, +31 new)tsc --noEmit: pre-existing copilot.ts errors on this base, same as feat(optimize): detect context-heavy sessions #246 had. CI doesn't run tsc (only semgrep). Resolved on the user's local feat branch but not yet on origin/main; not introduced by this PR.Security
No new attack surface. Bash command strings come from user's session telemetry, regex is anchored, no shell, no eval, no I/O.