perf: replace talc with an in-tree bump allocator#45
Merged
Conversation
The stub is a short-lived process that parses the manifest, resolves paths, then execve's / ExitProcess's — it never frees, so a general- purpose allocator is overkill. Replace talc with a tiny bump allocator (single Cell offset into the existing 8 MiB .bss arena; dealloc is a no-op; single-threaded so no synchronization) and drop the talc dependency entirely. Smaller on every release binary: x86_64-unknown-linux-musl 17168 -> 15048 (-2120) aarch64-unknown-linux-musl 16560 -> 14480 (-2080) s390x-unknown-linux-musl 18616 -> 16496 (-2120) x86_64-apple-darwin 29444 -> 25316 (-4128) aarch64-apple-darwin 67008 -> 50448 (-16560) x86_64-pc-windows-gnullvm 28672 -> 26624 (-2048) aarch64-pc-windows-gnullvm 27648 -> 25600 (-2048) TOTAL 205116 -> 174012 (-31104, ~15%) The arm64-darwin drop is a full 16 KB page: the smaller code pulls __TEXT back under the 16 KB Mach-O page boundary it previously spilled over. Alloc alignment/bounds/OOM logic verified by a host round-trip test; integration test still passes.
Binary Size ReportComparing
Total: 21.24 MiB → 21.21 MiB (-31,104 B (-0.14%)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The stub is a short-lived process that parses the manifest, resolves paths, then execve's / ExitProcess's: it never frees, so a general- purpose allocator is overkill. Replace talc with a tiny bump allocator (single Cell offset into the existing 8 MiB .bss arena; dealloc is a no-op; single-threaded so no synchronization) and drop the talc dependency entirely.
Smaller on every release binary:
x86_64-unknown-linux-musl 17168 -> 15048 (-2120)
aarch64-unknown-linux-musl 16560 -> 14480 (-2080)
s390x-unknown-linux-musl 18616 -> 16496 (-2120)
x86_64-apple-darwin 29444 -> 25316 (-4128)
aarch64-apple-darwin 67008 -> 50448 (-16560)
x86_64-pc-windows-gnullvm 28672 -> 26624 (-2048)
aarch64-pc-windows-gnullvm 27648 -> 25600 (-2048)
TOTAL 205116 -> 174012 (-31104, ~15%)
The arm64-darwin drop is a full 16 KB page: the smaller code pulls __TEXT back under the 16 KB Mach-O page boundary it previously spilled over. Alloc alignment/bounds/OOM logic verified by a host round-trip test; integration test still passes.