Quiet routine pricing warnings + menubar recovery from stuck-loading (#266)

iamtoruk · web-flow · commit 8208cf8ff5e0 · 2026-05-08T20:33:48.000-07:00
* Quiet routine pricing warnings + menubar recovery from stuck-loading

CLI:

- Default `codeburn` invocation no longer prints "no pricing data for model"
  warnings on every run. Greeting a fresh user with three lines of stderr
  before the dashboard even draws looked like the tool was broken on first
  launch. The warning now requires --verbose, and the suppressed pricing
  miss still results in $0 cost (correct for unmapped models).
- Local-model heuristic skips the warning entirely for Ollama tags
  (`qwen3.6:35b-a3b-bf16`), GGUF/quantized fingerprints, and similar names
  that will never have public pricing. The "update codeburn" hint was
  actively misleading there.
- When the warning does fire (with --verbose), it points users at
  `codeburn model-alias &lt;model&gt; &lt;known-model&gt;` as the actual escape hatch
  alongside the package update suggestion.

Menubar:

- Replace perpetual "Loading…" spinner with a FetchErrorOverlay when the
  per-key fetch fails and the cache is empty. User sees the error and a
  Retry button instead of an infinite hang.
- Add diagnostic breadcrumbs (NSLog, invisible to normal users — Console.app
  / `log stream --process CodeBurnMenubar` only) for the four states that
  produce a stuck loading overlay:
    - subprocess timeout after 45s
    - fetch result dropped due to Task cancellation (rapid tab switch)
    - fetch result dropped due to mid-fetch calendar rollover
    - retry attempt where the last successful fetch is &gt;2 min stale
- Track lastSuccessByKey separately from cache freshness so the staleness
  diagnostic survives day-rollover cache wipes.

* Stop flashing the compare-view loading screen on background refresh

When the 30s CLI tick updated `projects` while the user was reading the
model comparison results, the projects-watching effect always fired
setLoadTrigger, which flipped phase to 'loading' and re-ran the slow
scanSelfCorrections walk over every provider's session directory. The
user lost their scroll position and saw a loading flash mid-read.

Recompute the comparison rows in place when:
- the user is already on the results phase, AND
- both picked models still exist in the new aggregate.

Skip the corrections rescan on these in-place refreshes — corrections
drift slowly enough that holding the previous value until the user
re-enters compare is acceptable, and the rescan is the slow part of the
load. Initial selection and post-selection load still run the full
pipeline.
diff --git a/mac/Sources/CodeBurnMenubar/AppStore.swift b/mac/Sources/CodeBurnMenubar/AppStore.swift
@@ -46,6 +46,17 @@ final class AppStore {
     private var cache: [PayloadCacheKey: CachedPayload] = [:]
     private var cacheDate: String = ""
     private var switchTask: Task<Void, Never>?
+    /// Tracks the last successful fetch timestamp per key for stuck-loading
+    /// diagnostics. NOT used for cache-freshness logic — `CachedPayload.fetchedAt`
+    /// is authoritative there. This map persists across cache wipes (day
+    /// rollover, etc.) so we can distinguish "fresh install, never fetched"
+    /// from "cache was wiped 10 minutes ago and we still haven't refilled".
+    private var lastSuccessByKey: [PayloadCacheKey: Date] = [:]
+
+    private func staleSecondsForKey(_ key: PayloadCacheKey) -> TimeInterval {
+        guard let last = lastSuccessByKey[key] else { return .infinity }
+        return Date().timeIntervalSince(last)
+    }
 
     private var currentKey: PayloadCacheKey {
         PayloadCacheKey(period: selectedPeriod, provider: selectedProvider)
@@ -148,19 +159,41 @@ final class AppStore {
         if didShowLoading {
             loadingCount += 1
         }
+        // Diagnostic anchor: if this key has been empty for a long time (the
+        // popover would currently be showing "Loading..."), log how stale the
+        // miss is so the next time a user reports a stuck-loading bug we have
+        // a concrete data point — "no successful fetch for (today, claude)
+        // in 14 minutes" beats squinting at unified-log noise. We deliberately
+        // skip the first-attempt case (no prior success ever, finite check
+        // below filters .infinity) — that's just the cold path, not a bug.
+        let staleSeconds = staleSecondsForKey(key)
+        if staleSeconds.isFinite, staleSeconds > 120 {
+            NSLog("CodeBurn: refresh attempt for stale key \(key.period.rawValue)/\(key.provider.rawValue) — last success was \(Int(staleSeconds))s ago")
+        }
         defer {
             inFlightKeys.remove(key)
             if didShowLoading { loadingCount = max(loadingCount - 1, 0) }
         }
         do {
             let fresh = try await DataClient.fetch(period: key.period, provider: key.provider, includeOptimize: includeOptimize)
-            guard !Task.isCancelled else { return }
+            if Task.isCancelled {
+                // Distinguish cancellation (user switched tabs mid-fetch) from
+                // the silent-no-result path. Without this log, a cancelled
+                // fetch leaves cache empty + lastError nil and the user sees
+                // perpetual loading with nothing in the diagnostics.
+                NSLog("CodeBurn: fetch for \(key.period.rawValue)/\(key.provider.rawValue) cancelled before result was applied")
+                return
+            }
             // Day-rollover race guard: if the calendar date changed during the
             // fetch, this payload was computed against yesterday's date and
             // would pollute today's freshly-cleared cache. Drop it; the next
             // tick will refetch with today's data.
-            if cacheDate != cacheDateAtStart { return }
+            if cacheDate != cacheDateAtStart {
+                NSLog("CodeBurn: dropping fetch result for \(key.period.rawValue)/\(key.provider.rawValue) — calendar rolled mid-fetch")
+                return
+            }
             cache[key] = CachedPayload(payload: fresh, fetchedAt: Date())
+            lastSuccessByKey[key] = Date()
             lastError = nil
         } catch {
             if Task.isCancelled { return }
@@ -171,6 +204,7 @@ final class AppStore {
                     guard !Task.isCancelled else { return }
                     if cacheDate != cacheDateAtStart { return }
                     cache[key] = CachedPayload(payload: fallback, fetchedAt: Date())
+                    lastSuccessByKey[key] = Date()
                     lastError = nil
                     return
                 } catch {
diff --git a/mac/Sources/CodeBurnMenubar/Data/DataClient.swift b/mac/Sources/CodeBurnMenubar/Data/DataClient.swift
@@ -62,9 +62,16 @@ struct DataClient {
         }
 
         // Wall-clock timeout: if the CLI hangs (parser stuck, disk stall), kill it.
+        // Log when this fires so a recurring stuck-popover state has an actual
+        // diagnostic — historically users saw "Loading..." forever with no signal
+        // about what failed; the only way to debug was to read process state at
+        // the wrong time. The log line names the subcommand so we can correlate
+        // with a specific period/provider combination.
         let timeoutTask = Task.detached(priority: .utility) {
             try? await Task.sleep(nanoseconds: spawnTimeoutSeconds * 1_000_000_000)
             if process.isRunning {
+                NSLog("CodeBurn: CLI subprocess timed out after %llus for %@ — terminating",
+                      spawnTimeoutSeconds, subcommand.joined(separator: " "))
                 process.terminate()
             }
         }
diff --git a/mac/Sources/CodeBurnMenubar/Views/MenuBarContent.swift b/mac/Sources/CodeBurnMenubar/Views/MenuBarContent.swift
@@ -43,15 +43,21 @@ struct MenuBarContent: View {
 
                 // Overlay fires only on cold cache for the current key. This
                 // avoids the 1-frame `$0.00` flash on first-time period/provider
-                // switches (the body would otherwise render the empty payload
-                // for the runloop tick before the overlay slides in). With the
-                // cache no longer being wiped on every wake/manual-refresh,
-                // hasCachedData==false now means "we have never fetched this
-                // key before in this session", which is the right time to
-                // cover the popover.
+                // switches. When the fetch fails (CLI subprocess timeout, parse
+                // error, etc.), surface a retry card instead of leaving the
+                // user stuck on a perpetual "Loading..." spinner.
                 if !store.hasCachedData {
-                    BurnLoadingOverlay(periodLabel: store.selectedPeriod.rawValue)
+                    if let err = store.lastError, !store.isLoading {
+                        FetchErrorOverlay(
+                            error: err,
+                            periodLabel: store.selectedPeriod.rawValue,
+                            retry: { Task { await store.refresh(includeOptimize: false, force: true, showLoading: true) } }
+                        )
                         .transition(.opacity)
+                    } else {
+                        BurnLoadingOverlay(periodLabel: store.selectedPeriod.rawValue)
+                            .transition(.opacity)
+                    }
                 }
             }
             .frame(height: 520)
@@ -126,6 +132,49 @@ private struct EmptyProviderState: View {
     }
 }
 
+/// Shown when a fetch failed and the cache is still empty for this key. The
+/// user previously sat on the "Loading…" spinner forever — the popover had
+/// no path to recover beyond the next 30s tick (which would just re-fail).
+/// Now they see what broke and can retry directly.
+private struct FetchErrorOverlay: View {
+    let error: String
+    let periodLabel: String
+    let retry: () -> Void
+
+    var body: some View {
+        ZStack {
+            Rectangle().fill(.ultraThinMaterial)
+            VStack(spacing: 12) {
+                Image(systemName: "exclamationmark.triangle.fill")
+                    .font(.system(size: 28))
+                    .foregroundStyle(Theme.brandAccent)
+                Text("Couldn't load \(periodLabel)")
+                    .font(.system(size: 12.5, weight: .semibold))
+                    .foregroundStyle(.primary)
+                Text(displayError)
+                    .font(.system(size: 10.5))
+                    .foregroundStyle(.secondary)
+                    .multilineTextAlignment(.center)
+                    .frame(maxWidth: 280)
+                    .lineLimit(3)
+                Button("Retry", action: retry)
+                    .buttonStyle(.borderedProminent)
+                    .tint(Theme.brandAccent)
+                    .controlSize(.small)
+            }
+            .padding(.horizontal, 20)
+        }
+    }
+
+    /// Strip the leading subprocess noise that creeps into NSError descriptions
+    /// so the visible message is the actual cause, not the framework wrapper.
+    private var displayError: String {
+        let trimmed = error.trimmingCharacters(in: .whitespacesAndNewlines)
+        if trimmed.count <= 240 { return trimmed }
+        return String(trimmed.prefix(240)) + "…"
+    }
+}
+
 /// Translucent overlay that blurs whatever's behind it (the previous tab/period content)
 /// and centers an animated burning flame -- the brand mark filling up bottom-to-top in
 /// yellow→orange→red, looping.
diff --git a/src/compare.tsx b/src/compare.tsx
@@ -331,16 +331,40 @@ export function CompareView({ projects, onBack }: CompareViewProps) {
     const newModels = aggregateModelStats(projects)
     setModels(newModels)
 
-    if (pickedNames) {
-      const hasA = newModels.some(m => m.model === pickedNames[0])
-      const hasB = newModels.some(m => m.model === pickedNames[1])
-      if (hasA && hasB) {
-        setLoadTrigger(t => t + 1)
-      } else {
-        setPickedNames(null)
-        setPhase('select')
-      }
+    if (!pickedNames) return
+    const hasA = newModels.some(m => m.model === pickedNames[0])
+    const hasB = newModels.some(m => m.model === pickedNames[1])
+    if (!hasA || !hasB) {
+      setPickedNames(null)
+      setPhase('select')
+      return
+    }
+
+    // When the periodic CLI refresh updates `projects` while the user is
+    // reading the results page, recompute the comparison rows IN PLACE rather
+    // than flipping to a loading screen. Previously every 30s tick bounced the
+    // user to a loading flash and reset their scroll position; the slow part
+    // (scanSelfCorrections, which walks every provider's session dir) is
+    // skipped on these refreshes — corrections drift slowly enough that
+    // staying with the existing values until the user re-enters compare from
+    // scratch is fine.
+    if (phase === 'results') {
+      const a = newModels.find(m => m.model === pickedNames[0])
+      const b = newModels.find(m => m.model === pickedNames[1])
+      if (!a || !b) return
+      const aCopy = { ...a, selfCorrections: selectedA?.selfCorrections ?? 0 }
+      const bCopy = { ...b, selfCorrections: selectedB?.selfCorrections ?? 0 }
+      setSelectedA(aCopy)
+      setSelectedB(bCopy)
+      setRows(computeComparison(aCopy, bCopy))
+      setCategories(computeCategoryComparison(projects, a.model, b.model))
+      setStyle(computeWorkingStyle(projects, a.model, b.model))
+      return
     }
+
+    // Initial load (or returning from select after picking) — full pipeline,
+    // including scanSelfCorrections.
+    setLoadTrigger(t => t + 1)
   }, [projects])
 
   useEffect(() => {
diff --git a/src/models.ts b/src/models.ts
@@ -235,6 +235,36 @@ export function getModelCosts(model: string): ModelCosts | null {
 // session that used it, hiding real spend until the user noticed.
 const warnedUnknownModels = new Set<string>()
 
+/// Heuristic for "this looks like a local model that will never be in LiteLLM's
+/// pricing JSON". We suppress the unknown-model warning for these because the
+/// "update codeburn" advice can't help — local Ollama models, llama.cpp tags,
+/// LM Studio loads, etc. are billed locally and don't have public pricing.
+/// Users still get $0 in cost reports for them (correct — local inference is
+/// effectively free); the warning was just noise.
+function looksLikeLocalModel(name: string): boolean {
+  // Ollama and LM Studio tags include `:tag` (e.g. qwen3.6:35b-a3b-bf16).
+  if (name.includes(':') && !name.startsWith('http')) return true
+  // GGUF / quantized fingerprints commonly seen in local inference.
+  if (/[-_](q[2-8](_[a-z0-9]+)?|bf16|fp16|gguf|f16|f32)$/i.test(name)) return true
+  return false
+}
+
+function shouldWarnAboutUnknownModel(name: string): boolean {
+  if (!name || name === '<synthetic>') return false
+  if (warnedUnknownModels.has(name)) return false
+  // Suppress for local/quantized models — the "update codeburn" hint is
+  // actively misleading there. Users who need cost visibility for local
+  // inference can still set an alias via `codeburn model-alias`.
+  if (looksLikeLocalModel(name)) return false
+  // The warning fired on every CLI invocation (including the default
+  // dashboard) which made first launches look broken — three "no pricing
+  // data" lines greet a user before the dashboard even draws. Now opt-in
+  // via --verbose. The unknown model still costs $0 in reports; users who
+  // suspect missing models run `codeburn --verbose` to see the list.
+  if (process.env['CODEBURN_VERBOSE'] !== '1') return false
+  return true
+}
+
 export function calculateCost(
   model: string,
   inputTokens: number,
@@ -246,19 +276,16 @@ export function calculateCost(
 ): number {
   const costs = getModelCosts(model)
   if (!costs) {
-    // Skip the synthetic placeholder and the auto-router pseudo-models that
-    // intentionally have no direct pricing entry; calculateCost callers
-    // resolve those through aliasing first, so an unknown here is genuinely
-    // an unmapped real model.
-    if (model && model !== '<synthetic>' && !warnedUnknownModels.has(model)) {
+    if (shouldWarnAboutUnknownModel(model)) {
       warnedUnknownModels.add(model)
       // Strip control characters and cap length: model names come from JSONL
       // payloads written by external tools, so a hostile or corrupt file
       // could embed terminal escape sequences here.
       const safeName = model.replace(/[\x00-\x1F\x7F-\x9F]/g, '?').slice(0, 200)
+      const aliasHint = `Map it with: codeburn model-alias "${safeName}" <known-model>`
       process.stderr.write(
         `codeburn: no pricing data for model "${safeName}" — costs for this model will show $0. ` +
-        `Update with: npx codeburn@latest, or report at https://github.com/getagentseal/codeburn/issues.\n`
+        `${aliasHint}, or update with: npx codeburn@latest.\n`
       )
     }
     return 0

Original file line number	Diff line number	Diff line change
`@@ -62,9 +62,16 @@ struct DataClient {`
`62`	`62`	`}`
`63`	`63`
`64`	`64`	`// Wall-clock timeout: if the CLI hangs (parser stuck, disk stall), kill it.`
	`65`	`+ // Log when this fires so a recurring stuck-popover state has an actual`
	`66`	`+ // diagnostic — historically users saw "Loading..." forever with no signal`
	`67`	`+ // about what failed; the only way to debug was to read process state at`
	`68`	`+ // the wrong time. The log line names the subcommand so we can correlate`
	`69`	`+ // with a specific period/provider combination.`
`65`	`70`	`let timeoutTask = Task.detached(priority: .utility) {`
`66`	`71`	`try? await Task.sleep(nanoseconds: spawnTimeoutSeconds * 1_000_000_000)`
`67`	`72`	`if process.isRunning {`
	`73`	`+ NSLog("CodeBurn: CLI subprocess timed out after %llus for %@ — terminating",`
	`74`	`+ spawnTimeoutSeconds, subcommand.joined(separator: " "))`
`68`	`75`	`process.terminate()`
`69`	`76`	`}`
`70`	`77`	`}`