-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Custom memory allocators #4576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom memory allocators #4576
Conversation
@pks-t Thank you for preparing this! ❤️ This definitely looks better than what I prepared over in #4574. 👍 But there's one thing I don't like so much about this: I really just want to replace the low-level The other thing is the compile-time flag. Can we benchmark this change, and if it's not an introducing extremely annoying performance issues, just make pluggable allocators always available? Also, I agree, it would be cool if the CRTDBG stuff could be built on top of this, somehow. 👍 |
@carlosmn made the proposal of providing default And I do agree that the compile-time flag is ugly. My micro-benchmarks didn't turn up any notable performance penalties, but I didn't yet do some meaningful benchmarks. Together with those benchmarks we should probably also test if it has any noticable performance hit if all of these functions do have two additional parameters for LINE and FILE. Passing these in could even prove to be valuable for developers using libgit2, as they could hook up libgit2 with their own memory tracing implementations |
My tests:
Benchmarks are best of ten, with "build" being the normal build and "build-pluggable" being a build with pluggable allocators:
malloc_small: 1.5% performance degradation |
Huh. Shouldn't this actually be the benchmark that gets least impacted by the changes? 🤔 |
And with all functions accepting
Summarized:
You can already see that this is no rocket sience by the fact that the |
@arthurschreiber: yeah, I've been very surprised by that. I guess when it comes to bigger allocations there's a lot of variation due to them being passed to the kernel. |
483c2ad
to
d3453be
Compare
I've completely rewritten this PR. This proposal now makes use of a global Some minor footwork is still missing. First one is exposing the crtdbg allocator function such that applications could plug it in, with a stub returning "-1" in case libgit2 was compiled without crtdbg. Second one is to hook into our global options structure to set the desired allocator, which is also kind of trivial. And third is to enable custom allocators to only implement some of these functions, so to fall back on our own standard allocators. |
f703b72
to
c250ff4
Compare
It looks like you've dropped the ability to toggle this at build time. This means that we're now pluggable all the time, with a 1.5% speed hit in the best case (which is also the typical case)? |
Well, I wouldn't say in the best case but in the worst case where our complete program consists of nothing else but |
c250ff4
to
4868c36
Compare
I don't love it, but for the most part we're IO or CPU-bound, so |
OK. That's roughly how I feel. |
So is this ready to be merged then? |
The crtdbg allocators are currently being implemented as inline functions as part of the "w32_crtdbg_stacktrace.h" header. As we are moving towards pluggable allocators with the help of function pointers, though, we cannot make use of inlining anymore. Instead, we can only have a single implementation of these allocating functions. Move all implementations of the crtdbg allocators into "w32_crtdbg_stacktrace.c".
Currently, the `git__free` function is being defined in a single place, only, disregarding whether we use our standard allocators or the crtdbg allocators. This makes it a bit harder to convert our code base to use pluggable allocators, and furthermore makes the border between our two allocators a bit more blurry. Implement a separate `git__crtdbg__free` function for the crtdbg allocator in order to completely separate both allocator implementations.
Right now, the standard allocator is being declared as part of the "util.h" header as a set of inline functions. As with the crtdbg allocator functions, these inline functions make it hard to convert to function pointers for our allocators. Create a new "stdalloc" module containing our standard allocations functions to split these out. Convert the existing allocators to macros which make use of the stdalloc functions.
Our "util.h" header is a grabbag of various different functions, where many don't have a clear group they belong to. Our set of allocator functions though can be clearly singled out as a single group of functions that always belongs together. Furthermore, we will need to implement additional functions relating to our allocators subsystem when moving to pluggable allocators. Thus, we should just move these functions into their own "alloc" module.
Our desired architecture would make allocators completely pluggable, such that users of libgit2 can swap out memory allocators at runtime. While making e.g. debugging easier by not having to do a separate build, this feature can also help maintainers of bindings for libgit2 by tying the memory allocations into the other language's memory system. In order to do so, though, we first need to make our two different pre-existing allocators "stdalloc" and "crtdbg" have the same function signatures, as the "crtdbg" allocators all have an additional file and line argument. This is required to build correct stack traces for debugging memory allocations. As that feature may also be interesting to authors of other applications for debugging libgit2, we now simply add these arguments to our standard allocators. Obviously, this may come with a performance penalty. During some simple benchmarks no real impact could be measured though in contrast to a simple pluggable allocator. The following table summarizes the benchmarks. There were three different builds with our current standard allocator ("standard"), with pluggable authenticators accessed via function pointers ("pluggable") and for pluggable authenticators with file and line being added ("fileline"). Furthermore, there were three scenarios for 100.000.000 allocations of 100B ("small alloc"), 100.000.000 allocations of 100KB ("medium alloc"), and 1.000.000 allocations of 100MB. All results are best of 10 runs. |------------|-------------------|-------------------|-------------------| | build/test | small alloc | medium alloc | big alloc | |------------|-------------------|-------------------|-------------------| | standard | 4539779566, +0.0% | 5912927186, +0.0% | 5166935308, +0.0% | |------------|-------------------|-------------------|-------------------| | pluggable | 4611074505, +1.5% | 5979185308, +1.1% | 5388776352, +4.2% | |------------|-------------------|-------------------|-------------------| | fileline | 4588338192, +1.1% | 6004951910, +1.5% | 4942528135, -4.4% | |------------|-------------------|-------------------|-------------------| As can be seen, there is a performance overhead for pluggable allocators. Furthermore, it can also be seen that there is some big variance between runs, especially in the "big alloc" scenario. This is probably being caused by nondeterministic behaviour in the kernel for dynamic allocations. Still, it can be observed that there should be no real difference between the "pluggable" and "fileline" allocators.
Currently, our memory allocators are being redirected to the correct implementation at compile time by simply using macros. In order to make them swappable at runtime, this commit reshuffles that by instead making use of a global "git_allocator" structure, whose pointers are set up to reference the allocator functions. Like this, it becomes easy to swap out allocators by simply setting these function pointers. In order to initialize a "git_allocator", our provided allocators "stdalloc" and "crtdbg" both provide an init function. This is being called to initialize a passed in allocator struct and set up its members correctly. No support is yet included to enable users of libgit2 to switch out the memory allocator at a global level.
Tie in the newly created infrastructure for swapping out memory allocators into our settings code. A user can now simply use the new option "GIT_OPT_SET_ALLOCATOR" with `git_libgit2_opts`, passing in an already initialized allocator structure as vararg.
6a4bd1b
to
0f6348f
Compare
So I'm merging this for now. As you said, in case somebody complains about a possible performance impact, then we can still create a build switch for this feature. |
This is an RFC regarding custom memory allocators. Existing standard allocators are being renamed to
git__stdalloc__<fn>
. By default, our usual allocatorsgit__<fn>
are being defined togit__stdalloc__<fn>
. For CRTDBG, our allocatorsgit__<fn>
are being defined togit__crtdbg__<fn>
.If the CMake option "PLUGGABLE_ALLOCATORS" is set, we use an indirection. Instead of using typedefs, each of our allocator functions is not a define but instead a function pointer, which is by default set to the respecfive
git__stdalloc__<fn>
function. Ourgit_libgit2_opts
function is extended by "GIT_OPT_SET_ALLOCATOR, which receives as argument agit_allocator
structure. Its members will then be used to change the function pointers.I don't know whether we really need to distinguish between builds with and without pluggable allocators. The question is whether there's a performance hit when using function pointers. I haven't measured that yet, though.
The other thing I'm unhappy about is the CRTDBG stuff. It would be rather cool if we could allow users of libgit2 to swap in CRTDBG at runtime, in contrast to how we do it at compile time right now. It would also easily be possible, except for the fact that all CRTDBG functions have two additional parameters to create the correct trace: LINE and FILE. We could just extend all our allocators by these parameters. But this could again have a performance penalty, I guess. Again, this needs to be measured.
Anyway. I first want your opinions on these points. The implementation would still need a bit of cleanup. The
git_allocator
stuff should probably be its own module "alloc.{c,h}", but that's minor stuff only.