[ty] Avoid lookup maps for small place tables#26177
Merged
Merged
Conversation
Typing conformance resultsNo changes detected ✅Current numbersThe percentage of diagnostics emitted that were expected errors held steady at 94.37%. The percentage of expected errors that received a diagnostic held steady at 89.00%. The number of fully passing files held steady at 94/134. |
Memory usage reportSummary
Significant changesClick to expand detailed breakdownflake8
trio
sphinx
prefect
|
|
b42783d to
990313f
Compare
MichaReiser
approved these changes
Jun 22, 2026
Comment on lines
-164
to
-166
| /// Map from symbol name to its ID. | ||
| /// | ||
| /// Uses a hash table to avoid storing the name twice. |
Comment on lines
+243
to
+247
| self.map | ||
| .find(SymbolTable::hash_name(name), |id| { | ||
| self.table.symbols[*id].name == name | ||
| }) | ||
| .copied() |
Member
There was a problem hiding this comment.
Should we have our own SymbolReverseTable wrapper to avoid duplicating the find logic? Same for members
Member
|
I'm still not a 100% sure this is worth it but seems mostly fine |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Avoid retaining reverse-lookup hash tables for completed place tables when linear lookup is cheaper. Builders continue to use the existing hash tables to deduplicate entries; when construction finishes, symbol tables with at most 16 entries and member tables with at most eight entries discard the index and use allocation-free linear lookup. Larger tables retain the existing
hashbrown::HashTable.This supersedes #26156. The thresholds are independently benchmarked because comparing member expressions is more expensive than comparing symbol names. A cutoff of eight avoids that PR's CodSpeed regression for members, while 16 recovers more symbol-table memory without a measurable CPU regression.
Performance
At the final 16/8 cutoffs, the latest CI memory report shows:
semantic_indexThe initial 8/8 cutoff eliminated #26156's CodSpeed regression: 15 of 23 ty microbenchmarks improved, with the full range between -0.34% and +0.34%, while the project benchmarks ranged from -0.08% to +0.06% (microbenchmarks, projects). Raising only the symbol cutoff from 8 to 16 was classified as
No Changeacross every benchmark: the microbenchmarks ranged from -0.7% to +0.4% and the project benchmarks from -0.2% to 0.0% (microbenchmark comparison, project comparison). That change saves an additional 0.12–0.18% of total retained memory and 0.64–0.91% ofsemantic_indexmemory.Ecosystem distribution
I temporarily instrumented final table construction and post-build lookups across all 162 projects in the ecosystem-analyzer's pinned mypy-primer corpus. The final cutoffs give the two structures similar lookup exposure while avoiding indexes for most tables:
For symbols, increasing the cutoff from 8 to 16 removes another 217,586 indexes across the corpus runs and avoids 29.7 MiB of aggregate map storage while moving 16.64% of lookups to linear search. Applying the same increase to members would avoid only 9.8 MiB while moving another 24.96% of lookups to linear search, raising their nonempty linear lookup share to 71.74%. This supports the asymmetric 16/8 cutoffs.
The aggregate byte totals repeat common dependencies across independent project runs and should be read comparatively, not as whole-process memory totals.