[ty] Cache is_never_satisfied results#26261
Conversation
Typing conformance resultsNo changes detected ✅Current numbersThe percentage of diagnostics emitted that were expected errors held steady at 94.37%. The percentage of expected errors that received a diagnostic held steady at 89.00%. The number of fully passing files held steady at 94/134. |
Memory usage reportMemory usage unchanged ✅ |
|
Merging this PR will improve performance by 4.15%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | DateType |
281.4 ms | 270.1 ms | +4.15% |
Tip
Curious why this is faster? Use the CodSpeed MCP and ask your agent.
Comparing charlie/codex-cache-is-never-satisfied (fab1df6) with main (fc97f12)
Footnotes
-
64 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
c9e7acd to
fab1df6
Compare
is_never_satisfied results
is_never_satisfied resultsis_never_satisfied results
Summary
is_never_satisfiedreconstructs root path assignments and traverses the same constraint BDD whenever callers ask the same builder about the same interior root. This adds a builder-local cache keyed by the rootNodeId, allowing repeated top-level queries to reuse the completed boolean result.Only completed top-level calls read or populate the cache. Recursive calls remain uncached because their result depends on the current
PathAssignments, while terminal roots continue to return directly. The cache is local to each query builder, including builder views over compacted owned constraint sets, so cache state does not become part of owned storage.Performance
On the pinned Pydantic
ty_walltimeworkload, a census found that 70.1% of top-level checks repeated a root in the same builder and those repeats accounted for 48.0% of recursive visits. In an adjacent copied-binary comparison with 90 checks per variant, the candidate measured 2.962 s/check versus 3.014 s/check for control, a 1.725% improvement.This is directionally favorable but below the campaign's predeclared 2% and twice-noise acceptance thresholds, so the PR remains a draft for broader performance evaluation rather than a claimed win.