[ty] Improve flow snapshot performance#26012
Conversation
Typing conformance resultsNo changes detected ✅Current numbersThe percentage of diagnostics emitted that were expected errors held steady at 94.37%. The percentage of expected errors that received a diagnostic held steady at 89.00%. The number of fully passing files held steady at 94/134. |
Memory usage reportSummary
Significant changesClick to expand detailed breakdownflake8
trio
sphinx
prefect
|
|
Merging this PR will improve performance by 6.22%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | DateType |
248.4 ms | 215.9 ms | +15.07% |
| ⚡ | Simulation | anyio |
1.2 s | 1.1 s | +9.09% |
| ⚡ | Simulation | attrs |
560.7 ms | 526.4 ms | +6.5% |
| ⚡ | Simulation | ty_micro[many_tuple_assignments] |
72.2 ms | 68.4 ms | +5.47% |
| ⚡ | Simulation | ty_micro[complex_constrained_attributes_2] |
74.8 ms | 70.9 ms | +5.44% |
| ⚡ | Simulation | ty_micro[complex_constrained_attributes_1] |
75.2 ms | 71.5 ms | +5.17% |
| ⚡ | Simulation | ty_micro[very_large_tuple] |
77.1 ms | 73.3 ms | +5.12% |
| ⚡ | Simulation | ty_micro[complex_constrained_attributes_3] |
80.5 ms | 76.6 ms | +5.09% |
| ⚡ | Simulation | ty_micro[gradual_vararg_call] |
75.6 ms | 72.2 ms | +4.74% |
| ⚡ | Simulation | ty_micro[recursive_typed_dict_union_contextual_inference] |
86.7 ms | 82.9 ms | +4.57% |
| ⚡ | Simulation | ty_micro[many_string_assignments] |
84 ms | 80.5 ms | +4.42% |
| ⚡ | Simulation | ty_micro[pydantic_core_schema_dict] |
87.6 ms | 83.9 ms | +4.41% |
Tip
Curious why this is faster? Use the CodSpeed MCP and ask your agent.
Comparing charlie/codex-lazy-flow-state-snapshots (e9f4e17) with main (bcae1b7)
Footnotes
-
60 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
1302874 to
a1ccd7b
Compare
dhruvmanila
left a comment
There was a problem hiding this comment.
This is great, thank you!
It also helped me revise how we do control-flow analysis :)
| debug_assert_ne!( | ||
| current, event.parent, | ||
| "pending reachability must be an ancestor" | ||
| ); |
There was a problem hiding this comment.
Is this an actual invariant which needs to be followed? It seems like if this invariant is somehow broken in the release build, it will enter infinite loop given that current = event.parent which should be the same. And, given that unapplied is going to keep on pushing, it might eventually lead to OOM.
Should we try to handle this case here by either breaking out of the loop or using assert_ne instead?
There was a problem hiding this comment.
Good call, this is a required invariant -- changing to assert_ne.
| debug_assert_ne!( | ||
| current, event.parent, | ||
| "pending reachability must be an ancestor" | ||
| ); |
There was a problem hiding this comment.
Same comment as above regarding if this is an invariant that needs to be followed and how to handle if it breaks.
| fn materialize_ref<'a>( | ||
| &self, | ||
| pending: &'a mut PendingPlaceState, | ||
| target: PendingReachabilityId, | ||
| reachability_constraints: &mut ReachabilityConstraintsBuilder, | ||
| ) -> &'a PlaceState { |
There was a problem hiding this comment.
Could we document when the caller should use materialize_ref instead of materialize? Given that the parameters are the same and based on the return type, it seems important to understand this difference for the optimization to keep using the shared Rc
23c6d01 to
e9f4e17
Compare
Summary
When we build a semantic index, we currently snapshot flow state by eagerly cloning every symbol and member's bindings and declarations at control-flow branches. We also apply scope-wide reachability constraints to every live place immediately, even though most places are unchanged before the branches merge.
We now represent snapshots with copy-on-write shared place states and record scope-wide reachability constraints in an append-only parent-linked structure. A place materializes its pending constraints only when it is read or mutated. When both branches still share the same place state, we merge only their path constraints, allowing complementary truthy and falsy paths to cancel without rebuilding the place.
The specialized star-import snapshot path remains narrowly scoped and avoids allocating a temporary member list.
Performance
The current CodSpeed comparison reports a 6.78% overall improvement, with 12 improved benchmarks and no regressions. DateType improved by 15.51%, anyio by 9.10%, and attrs by 6.96%. CodSpeed notes that some comparisons used different runtime environments, which may affect the exact percentages.
The typing-conformance and ecosystem comparisons reported no diagnostic changes, and retained memory was effectively unchanged.