Codestin Search App

iamtoruk · 2026-05-06T07:10:34Z

Supersedes #242 (cross-fork PR by @ozymandiashh — original commit preserved as the first commit here, follow-up commit applies review fixes).

Summary

Adds a detectContextBloat() finding to codeburn optimize that flags sessions where effective input/cache tokens (cache-discounted via existing pricing constants) are large and disproportionate to output. Suggests starting fresh with a tightened context.

effective input/cache floor: 75K tokens
minimum effective input/output ratio: 25:1
token-savings estimate: context above a healthier 15:1 ratio
preview limited to top 5 by excess; rest summarized as "+N more"
ratio capped at "1000+:1" to avoid noise from zero-output sessions
deterministic ordering for stable previews and tests

Review fixes on top of #242

Adversarial review against real session data (22.6K sessions / $4.8K spend) found three concrete issues that this commit addresses:

Heavy overlap with detectSessionOutliers. All 5 of the top-5 context-bloat sessions also appeared in the top-5 outlier list. Same sessions, two framings, two "potential savings" lines that the user mentally adds together. Fix: detectSessionOutliers now accepts an optional excludedSessionIds set; scanAndDetect runs context-bloat first, builds the candidate ID set, and passes it through. Real-data outlier count dropped from 96 → 19, and the two findings' top-5 lists are now disjoint. Headline "potential savings" went from a misleading 62% of spend to an honest 10%.
Time-blind growth ratio. "1131x previous session input" was alarming on the surface but was sometimes just an artifact of resuming after a long gap (small test session → real working session weeks later). Fix: new CONTEXT_BLOAT_GROWTH_MAX_GAP_MS = 7 days; growth ratio is suppressed when the predecessor is older than that.
Binary impact tiers. Old logic was >=3 candidates || >=500K total → high, else medium. A 300-session pile-up scored the same as a 3-session minor finding. Fix: three real tiers — high (≥10 candidates or ≥500K total), low (≤2 candidates AND <200K total), medium otherwise.

Plus 7 added tests:

medium-tier boundary
high tier at exactly 10 candidates
1000+:1 cap with non-zero output (previously only zero-output covered)
time-gap suppression (>7 day predecessor → no growth callout)
below-threshold predecessor still anchors growth (matches existing code comment)
detectSessionOutliers skips sessions in the exclusion set
detectSessionOutliers still flags cost outliers not in the exclusion set

One existing test updated: a single 93K-token session (1 candidate, <200K total) is now low impact rather than medium.

Validation

npx vitest run — 35 files, 498 tests pass (was 491 in feat(optimize): detect context-heavy sessions #242; +7 new)
Real-data optimize -p 30days — confirmed dedup works, outlier list shrunk from 96 → 19, savings figures no longer double-counted

Security

No new attack surface. Pure read-only data transform over existing parsed ProjectSummary. No I/O, shell, eval, or external input.

- Exclude sessions already flagged by detectContextBloat from the detectSessionOutliers preview. On real data the two findings shared most of their top-5 sessions; the outlier list now focuses on cost anomalies that are not also context-bloated. - Suppress the "Nx previous session input" growth callout when the previous session is more than 7 days back. Prevents alarming numbers like "1131x growth" that are actually artifacts of resuming a project after a long break, not bad context management. - Replace the binary high/medium impact tiering with three tiers: high at >=10 candidates or >=500K total effective tokens, low at <=2 candidates and <200K total, medium otherwise. Stops a single small finding from competing visually with a 300-session pile-up. - Tests added: medium-tier boundary, high tier at 10+ candidates, 1000+:1 cap with non-zero output, time-gap suppression, anchor growth from a below-threshold predecessor, outlier exclusion when a session is in the context-bloat exclusion set.

@ozymandiashh

Adds a low-worth detector to codeburn optimize that flags expensive sessions with weak delivery signals (no edits, repeated retries, or no one-shot edits) when no git/gh delivery command is observed. Priority order is low-worth → context-bloat → outliers; each later detector excludes sessions named by an earlier one so the same session is never listed in three findings. Detection: floor, for no-edit, 3+ retries, regex matches git commit/push and gh pr create/merge but excludes commit-tree/commit-graph and dry-run. Three impact tiers consistent with #246. Token-savings uses full session tokens for no-edit sessions and the retry fraction for edit-with-retry sessions. Supersedes #241 with review fixes. Original implementation by @ozymandiashh.

ozymandiashh and others added 2 commits May 6, 2026 03:12

feat(optimize): detect context-heavy sessions

6a4436d

iamtoruk mentioned this pull request May 6, 2026

feat(optimize): detect context-heavy sessions #242

Closed

iamtoruk merged commit f92d57d into main May 6, 2026
3 checks passed

iamtoruk deleted the feat/context-bloat-analyzer branch May 6, 2026 07:11

This was referenced May 6, 2026

feat(optimize): flag low-worth expensive sessions #247

Merged

feat(optimize): flag low-worth expensive sessions #241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(optimize): detect context-heavy sessions#246

feat(optimize): detect context-heavy sessions#246
iamtoruk merged 2 commits into
mainfrom
feat/context-bloat-analyzer

iamtoruk commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iamtoruk commented May 6, 2026

Summary

Review fixes on top of #242

Validation

Security

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants