Codestin Search App

iamtoruk · 2026-05-06T07:34:55Z

Supersedes #241 (cross-fork PR by @ozymandiashh — original intent preserved, this branch was built on current main with the review fixes integrated cleanly into the #246 dedup pattern instead of being layered on top of an older base).

Summary

Adds detectLowWorthSessions to codeburn optimize. Flags expensive sessions (≥$2 spend; ≥$3 if no edit turns) with weak delivery signals — no edits, repeated retries, or edit work that never landed in one shot — when no git/gh delivery command is observed in the bash history. Framed as review candidates, not proof of waste.

Detection model

$2 floor; $3 floor when "no edit turns" is the only signal
3 retries to trip the retry reason; 2 retries with edits and zero one-shot edits to trip the "no one-shot edit turns" reason
categoryBreakdown aggregates preferred when present, falls back to raw turns
Delivery commands: git commit, git push, gh pr create, gh pr merge (excluding --dry-run in the same pipeline segment)

Review fixes integrated on top of #241's commit

Triple-detector dedup (extends feat(optimize): detect context-heavy sessions #246). Priority order: low-worth → context-bloat → outliers. findLowWorthCandidates and findContextBloatCandidates build ID sets ahead of detection; detectContextBloat and detectSessionOutliers accept an excludedSessionIds param and filter accordingly. Real-data top-5 lists are now disjoint across all three findings.
commit-tree regex false positive fixed. Used (?:\s|$|--) after commit|push instead of \b, so git commit-tree HEAD^{tree} and git commit-graph write are no longer treated as deliveries while git commit --amend still is. New tests cover both cases.
Three impact tiers consistent with feat(optimize): detect context-heavy sessions #246: high (≥10 candidates OR ≥$50 total) · low (≤2 candidates AND <$10 total) · medium otherwise. Replaces the original binary tiering.
Replaced the magic 0.5 token-savings ratio with a two-regime model:
- No-edit sessions: full session token total (the session produced no apparent output to weigh against the spend).
- Sessions with edits but with retries / no one-shot: retry fraction (retries / totalTurns × tokens, clamped to [0,1]). Edits may still have been useful; we credit the model with that and only flag the retry overhead.
Fix-text differentiated from the outlier detector. Outlier still says "tighter constraint, smaller plan." Low-worth now says "name the deliverable in one sentence; stop after 10 minutes without an edit or 2 failures; no retries past 2 attempts on any single fix."

Validation

npx vitest run — 38 files, 529 tests pass (was 498 before, +31 new)
Real-data run on a 22.6K-session / $4.8K archive:
- Top-5 lists across low-worth, context-bloat, outliers are completely disjoint
- Headline savings: ~17% of spend (was 10% pre-low-worth; would've been 54% with the naive full-session ceiling; was triple-counted in the original PR)
- Per-finding totals: low-worth $629, context-heavy $54, outliers $123
tsc --noEmit: pre-existing copilot.ts errors on this base, same as feat(optimize): detect context-heavy sessions #246 had. CI doesn't run tsc (only semgrep). Resolved on the user's local feat branch but not yet on origin/main; not introduced by this PR.

Security

No new attack surface. Bash command strings come from user's session telemetry, regex is anchored, no shell, no eval, no I/O.

Adds detectLowWorthSessions to the optimize pipeline. Flags expensive sessions (>=$2 spend; >=$3 if no edit turns at all) with weak delivery signals -- no edits, repeated retries, or edit work that never landed in one shot -- when no git/gh delivery command is observed in the session's bash history. Built on top of the existing #246 dedup pattern. Priority order is low-worth -> context-bloat -> outliers; each later detector excludes sessions already named by an earlier one so a single session is never listed in three findings. Detection model: - $2 floor; $3 floor when the only signal is "no edit turns". - 3 retries to trip the retry reason; 2 retries with edit turns and zero one-shot edits to trip the "no one-shot edit turns" reason. - categoryBreakdown aggregates are preferred when present; falls back to raw turns for older parsed sessions. Delivery-command regex uses (?:\s|$|--) instead of \b after commit/push to avoid false positives like `git commit-tree HEAD^{tree}` and `git commit-graph write` while still matching `git commit --amend`. A `--dry-run` lookahead in the same pipeline segment excludes preview commands. Three impact tiers consistent with detectContextBloat: high at >=10 candidates or >=$50 total candidate spend; low at <=2 candidates AND <$10 total; medium otherwise. Token-savings estimate replaces the original 0.5 magic ratio with two defensible regimes: - No-edit sessions: full session token total (the session produced no apparent output to weigh against the spend). - Sessions with edits but with retries / no one-shot: retry fraction (retries / total turns) of the session token total. Edits may still have been useful; we credit the model with that and only flag the retry overhead. Fix-text differentiated from outlier detector's: low-worth focuses on naming a deliverable up front and capping retry attempts; outliers keeps the existing "tighter constraint" framing. Tests: - New describe block for detectLowWorthSessions (17 tests covering thresholds, reasons, delivery-command detection including commit-tree false-positive guard, dry-run handling, three impact tiers, retry-fraction savings model, full-session no-edit savings). - One new test for detectContextBloat asserting it honors the excludedSessionIds parameter (not just outliers). - Real-data run on 22.6K sessions: top-5 lists are now disjoint across all three per-session detectors, headline savings is 17% of spend (vs 10% pre-PR with just context-bloat dedup, vs 54% if we had used the full-session ceiling for low-worth).

iamtoruk mentioned this pull request May 6, 2026

feat(optimize): flag low-worth expensive sessions #241

Closed

iamtoruk merged commit 75d4701 into main May 6, 2026
3 checks passed

iamtoruk deleted the feat/worth-it-score branch May 6, 2026 07:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(optimize): flag low-worth expensive sessions#247

feat(optimize): flag low-worth expensive sessions#247
iamtoruk merged 1 commit into
mainfrom
feat/worth-it-score

iamtoruk commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iamtoruk commented May 6, 2026

Summary

Detection model

Review fixes integrated on top of #241's commit

Validation

Security

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant