[ty] Avoid stack overflows in reachability analysis#26272
Conversation
Typing conformance resultsNo changes detected ✅Current numbersThe percentage of diagnostics emitted that were expected errors held steady at 94.37%. The percentage of expected errors that received a diagnostic held steady at 89.00%. The number of fully passing files held steady at 94/134. |
Memory usage reportSummary
Significant changesClick to expand detailed breakdownprefect
sphinx
trio
flake8
|
|
|
3a765ce to
4037ee8
Compare
Merging this PR will improve performance by 34.19%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | ty_micro[typevar_mapping_small_accumulations] |
177.6 ms | 104.4 ms | +70.1% |
| ⚡ | Memory | ty_micro[typevar_mapping_small_accumulations] |
12.2 MB | 11.5 MB | +5.87% |
Tip
Curious why this is faster? Use the CodSpeed MCP and ask your agent.
Comparing charlie/fix-3822-stacker (9501f4c) with main (ce46714)
Footnotes
-
64 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
4037ee8 to
c922d90
Compare
c922d90 to
3c23580
Compare
|
I don’t think I’m the right reviewer for this. I’ve zero context on this code |
7bca03a to
fe5f30b
Compare
|
Looks like this still has conflicts vs main? |
fe5f30b to
f68348a
Compare
|
No more conflicts but I'm exploring one other implementation first. |
|
Opening up for review. |
|
Does that mean you decided this approach is preferable to #26310 ? |
|
Yeah. They're ultimately pretty similar but I found this one easier to understand. |
| for index in range { | ||
| let predicate = &predicates[ScopedPredicateId::new(index)]; | ||
| if matches!(predicate.node, PredicateNode::IsNonTerminalCall(_)) { | ||
| analyze_single(db, predicate); |
There was a problem hiding this comment.
This approach of pre-warming all IsNonTerminalCall predicates that appear prior to us in the predicate array means that in branch-heavy code we may well be pre-warming a bunch of predicates we won't actually need, because they occur in different (earlier in source order) control-flow branches than the one we are in. This seems like probably an OK tradeoff for now? The more precise approach would be a graph walk and a demand-driven work-list, which would definitely be more complicated. But I think it's at least worth documenting this choice and its limitations explicitly in a comment. (It's possible that this is contributing to the extra memory usage in this PR -- though for code we are actually checking, I expect we'll eventually exercise all the predicates in a scope anyway.)
f68348a to
9501f4c
Compare
|
Hmm, Codex says this test still stack overflows: #[test]
fn early_call_reentering_late_implicit_attribute_does_not_overflow_stack() -> anyhow::Result<()> {
let handle = std::thread::Builder::new()
.name("early-late-implicit-attribute-stack-test".into())
.stack_size(ruff_db::STACK_SIZE)
.spawn(|| {
let mut db = setup_db();
let mut ui = String::from(
r#"from widgets import Widget
class Ui:
def __init__(self):
self.target = Widget()
def setup(self):
self.target.configure()
self.early = Widget()
"#,
);
for index in 0..400 {
ui.push_str(&format!(
concat!(
" self.widget_{index} = Widget()\n",
" self.widget_{index}.configure()\n",
" self.widget_{index}.configure()\n",
" self.widget_{index}.configure()\n",
),
index = index,
));
}
ui.push_str(" self.target = Widget()\n");
db.write_files([
(
"/src/widgets.py",
r#"class Widget:
def configure(self) -> None: ...
"#,
),
("/src/ui.py", &ui),
(
"/src/consumer.py",
r#"from typing_extensions import reveal_type
from ui import Ui
from widgets import Widget
class Form(Ui):
def early_widget(self) -> Widget:
reveal_type(self.early)
return self.early
"#,
),
])?;
assert_revealed_type(&db, "/src/consumer.py", "Widget");
Ok(())
})?;
handle.join().expect("regression test thread panicked")
}By way of explanation, it says:
This makes sense to me -- seems like an issue that an actual graph walk might fix? But we can also wait and see if it comes up in real code. |
|
That makes sense. I’ll keep looking into it tomorrow. |
Summary
We record a reachability predicate for every statement-level call because the call may return
Never. Predicate IDs follow source order, but reachability decision diagrams put later predicates before earlier ones. In a large generated method, analyzing a late call could infer its receiver, re-enter reachability for the preceding call, and continue backward through thousands of calls until the worker stack overflowed.For example, given:
Reachability previously discovered the dependencies backward:
Before evaluating a reachability constraint, we now force analysis to process the relevant statement-level calls one by one in source order:
Non-terminal-call analysis is a tracked Salsa query, so each result is cached. When inferring a later call asks about an earlier call, it gets that cached result instead of adding another nested inference and reachability frame.
Inferring a call can itself re-enter reachability, so a scope-keyed guard prevents the same source-order pass from starting again. A nested scope is still allowed to perform its own pass. The ordinary decision-diagram evaluation and
Never-based narrowing behavior remain unchanged.The exact PySide6 reproduction now completes successfully instead of aborting with a stack overflow.
Closes astral-sh/ty#3822.