Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

pks-t
Copy link
Member

@pks-t pks-t commented Mar 10, 2018

This is an RFC regarding custom memory allocators. Existing standard allocators are being renamed to git__stdalloc__<fn>. By default, our usual allocators git__<fn> are being defined to git__stdalloc__<fn>. For CRTDBG, our allocators git__<fn> are being defined to git__crtdbg__<fn>.

If the CMake option "PLUGGABLE_ALLOCATORS" is set, we use an indirection. Instead of using typedefs, each of our allocator functions is not a define but instead a function pointer, which is by default set to the respecfive git__stdalloc__<fn> function. Our git_libgit2_opts function is extended by "GIT_OPT_SET_ALLOCATOR, which receives as argument a git_allocator structure. Its members will then be used to change the function pointers.

I don't know whether we really need to distinguish between builds with and without pluggable allocators. The question is whether there's a performance hit when using function pointers. I haven't measured that yet, though.

The other thing I'm unhappy about is the CRTDBG stuff. It would be rather cool if we could allow users of libgit2 to swap in CRTDBG at runtime, in contrast to how we do it at compile time right now. It would also easily be possible, except for the fact that all CRTDBG functions have two additional parameters to create the correct trace: LINE and FILE. We could just extend all our allocators by these parameters. But this could again have a performance penalty, I guess. Again, this needs to be measured.

Anyway. I first want your opinions on these points. The implementation would still need a bit of cleanup. The git_allocator stuff should probably be its own module "alloc.{c,h}", but that's minor stuff only.

@arthurschreiber
Copy link
Member

@pks-t Thank you for preparing this! ❤️

This definitely looks better than what I prepared over in #4574. 👍

But there's one thing I don't like so much about this: I really just want to replace the low-level malloc, calloc, realloc and free. I do not actually care about the other functions like mallocarray, as they are implemented on top of the former. I also don't really to have to re-implement the error handling for OOM. As I think rugged (and maybe pygit2) will be the only consumers of this for the foreseeable future, YAGNI might be a good guiding principle here. 😅

The other thing is the compile-time flag. Can we benchmark this change, and if it's not an introducing extremely annoying performance issues, just make pluggable allocators always available?

Also, I agree, it would be cool if the CRTDBG stuff could be built on top of this, somehow. 👍

@pks-t
Copy link
Member Author

pks-t commented Mar 13, 2018

@carlosmn made the proposal of providing default strdup, strndup etc functions. So while the library may overwrite them, as well, they wouldn't have to.

And I do agree that the compile-time flag is ugly. My micro-benchmarks didn't turn up any notable performance penalties, but I didn't yet do some meaningful benchmarks. Together with those benchmarks we should probably also test if it has any noticable performance hit if all of these functions do have two additional parameters for LINE and FILE. Passing these in could even prove to be valuable for developers using libgit2, as they could hook up libgit2 with their own memory tracing implementations

@pks-t
Copy link
Member Author

pks-t commented Mar 13, 2018

My tests:

#include "clar_libgit2.h"

void test_core_alloc__malloc_small(void)
{
	size_t i;
	void *mem;

	for (i = 0; i < 100000000; i++) {
		mem = git__malloc(100);
		git__free(mem);
	}
}

void test_core_alloc__malloc_medium(void)
{
	size_t i;
	void *mem;

	for (i = 0; i < 100000000; i++) {
		mem = git__malloc(100000);
		git__free(mem);
	}
}

void test_core_alloc__malloc_many_huge(void)
{
	size_t i;
	void *mem;

	for (i = 0; i < 1000000; i++) {
		mem = git__malloc(100000000);
		git__free(mem);
	}
}

Benchmarks are best of ten, with "build" being the normal build and "build-pluggable" being a build with pluggable allocators:

build,malloc_small,4539779566
build,malloc_medium,5912927186
build,malloc_many_huge,5166935308
build-pluggable,malloc_small,4611074505
build-pluggable,malloc_medium,5979185308
build-pluggable,malloc_many_huge,5388776352

malloc_small: 1.5% performance degradation
malloc_medium: 1.1% performance degradation
malloc_many_huge: 4.2% performance degradation

@arthurschreiber
Copy link
Member

malloc_many_huge: 4.2% performance degradation

Huh. Shouldn't this actually be the benchmark that gets least impacted by the changes? 🤔

@pks-t
Copy link
Member Author

pks-t commented Mar 13, 2018

And with all functions accepting __LINE__ and __FILE__:

build-fileline,malloc_small,4588338192
build-fileline,malloc_medium,6004951910
build-fileline,malloc_many_huge,4942528135

Summarized:

|------------|-------------------|-------------------|-------------------|
| build/test | small alloc       | medium alloc      | big alloc         |
|------------|-------------------|-------------------|-------------------|
| standard   | 4539779566, +0.0% | 5912927186, +0.0% | 5166935308, +0.0% |
|------------|-------------------|-------------------|-------------------|
| pluggable  | 4611074505, +1.5% | 5979185308, +1.1% | 5388776352, +4.2% |
|------------|-------------------|-------------------|-------------------|
| fileline   | 4588338192, +1.1% | 6004951910, +1.5% | 4942528135, -4.4% |
|------------|-------------------|-------------------|-------------------|

You can already see that this is no rocket sience by the fact that the fileline allocator actually is faster than the standard non-pluggable allocator...

@pks-t
Copy link
Member Author

pks-t commented Mar 13, 2018

@arthurschreiber: yeah, I've been very surprised by that. I guess when it comes to bigger allocations there's a lot of variation due to them being passed to the kernel.

@pks-t pks-t force-pushed the pks/memory-allocator branch from 483c2ad to d3453be Compare March 14, 2018 11:19
@pks-t
Copy link
Member Author

pks-t commented Mar 14, 2018

I've completely rewritten this PR. This proposal now makes use of a global git_allocator struct, which bundles all allocator functions. Our two provided allocators provide a function git_(stdalloc|crtdbg)_init_allocator, which gets the struct and sets its function pointers accordingly. Furthermore, it extends all functions except git__free by a file and line parameter. This does now indeed make it possible to swap out the standard memory allocator for the crtdbg allocator at runtime.

Some minor footwork is still missing. First one is exposing the crtdbg allocator function such that applications could plug it in, with a stub returning "-1" in case libgit2 was compiled without crtdbg. Second one is to hook into our global options structure to set the desired allocator, which is also kind of trivial. And third is to enable custom allocators to only implement some of these functions, so to fall back on our own standard allocators.

@pks-t pks-t force-pushed the pks/memory-allocator branch 6 times, most recently from f703b72 to c250ff4 Compare March 20, 2018 14:26
@ethomson
Copy link
Member

It looks like you've dropped the ability to toggle this at build time. This means that we're now pluggable all the time, with a 1.5% speed hit in the best case (which is also the typical case)?

@pks-t
Copy link
Member Author

pks-t commented Mar 26, 2018

Well, I wouldn't say in the best case but in the worst case where our complete program consists of nothing else but mallocs. I can restore the build toggle again if you feel like it is worth the additional code, but I feel like that's not really necessary

@pks-t pks-t force-pushed the pks/memory-allocator branch from c250ff4 to 4868c36 Compare April 6, 2018 09:18
@ethomson
Copy link
Member

@carlosmn, how do you feel about the perf impact here?

@pks-t Can you add a note to the CHANGELOG?

@pks-t
Copy link
Member Author

pks-t commented May 18, 2018 via email

@carlosmn
Copy link
Member

how do you feel about the perf impact here?

I don't love it, but for the most part we're IO or CPU-bound, so malloc being slightly slower is probably worth the trade-off. If someone really needs to disable it, we can bring back a switch.

@ethomson
Copy link
Member

I don't love it, but for the most part we're IO or CPU-bound, so malloc being slightly slower is probably worth the trade-off. If someone really needs to disable it, we can bring back a switch.

OK. That's roughly how I feel. :shipit:

@pks-t
Copy link
Member Author

pks-t commented May 30, 2018

So is this ready to be merged then?

pks-t added 7 commits June 7, 2018 12:57
The crtdbg allocators are currently being implemented as inline
functions as part of the "w32_crtdbg_stacktrace.h" header. As we are
moving towards pluggable allocators with the help of function pointers,
though, we cannot make use of inlining anymore. Instead, we can only
have a single implementation of these allocating functions.

Move all implementations of the crtdbg allocators into
"w32_crtdbg_stacktrace.c".
Currently, the `git__free` function is being defined in a single place,
only, disregarding whether we use our standard allocators or the crtdbg
allocators. This makes it a bit harder to convert our code base to use
pluggable allocators, and furthermore makes the border between our two
allocators a bit more blurry.

Implement a separate `git__crtdbg__free` function for the crtdbg
allocator in order to completely separate both allocator
implementations.
Right now, the standard allocator is being declared as part of the
"util.h" header as a set of inline functions. As with the crtdbg
allocator functions, these inline functions make it hard to convert to
function pointers for our allocators.

Create a new "stdalloc" module containing our standard allocations
functions to split these out. Convert the existing allocators to macros
which make use of the stdalloc functions.
Our "util.h" header is a grabbag of various different functions, where
many don't have a clear group they belong to. Our set of allocator
functions though can be clearly singled out as a single group of
functions that always belongs together. Furthermore, we will need to
implement additional functions relating to our allocators subsystem when
moving to pluggable allocators. Thus, we should just move these
functions into their own "alloc" module.
Our desired architecture would make allocators completely pluggable,
such that users of libgit2 can swap out memory allocators at runtime.
While making e.g. debugging easier by not having to do a separate build,
this feature can also help maintainers of bindings for libgit2 by tying
the memory allocations into the other language's memory system.

In order to do so, though, we first need to make our two different
pre-existing allocators "stdalloc" and "crtdbg" have the same function
signatures, as the "crtdbg" allocators all have an additional file and
line argument. This is required to build correct stack traces for
debugging memory allocations. As that feature may also be interesting to
authors of other applications for debugging libgit2, we now simply add
these arguments to our standard allocators.

Obviously, this may come with a performance penalty. During some simple
benchmarks no real impact could be measured though in contrast to a
simple pluggable allocator. The following table summarizes the
benchmarks. There were three different builds with our current standard
allocator ("standard"), with pluggable authenticators accessed via
function pointers ("pluggable") and for pluggable authenticators with
file and line being added ("fileline"). Furthermore, there were three
scenarios for 100.000.000 allocations of 100B ("small alloc"),
100.000.000 allocations of 100KB ("medium alloc"), and 1.000.000
allocations of 100MB. All results are best of 10 runs.

|------------|-------------------|-------------------|-------------------|
| build/test | small alloc       | medium alloc      | big alloc         |
|------------|-------------------|-------------------|-------------------|
| standard   | 4539779566, +0.0% | 5912927186, +0.0% | 5166935308, +0.0% |
|------------|-------------------|-------------------|-------------------|
| pluggable  | 4611074505, +1.5% | 5979185308, +1.1% | 5388776352, +4.2% |
|------------|-------------------|-------------------|-------------------|
| fileline   | 4588338192, +1.1% | 6004951910, +1.5% | 4942528135, -4.4% |
|------------|-------------------|-------------------|-------------------|

As can be seen, there is a performance overhead for pluggable
allocators. Furthermore, it can also be seen that there is some big
variance between runs, especially in the "big alloc" scenario. This is
probably being caused by nondeterministic behaviour in the kernel for
dynamic allocations. Still, it can be observed that there should be no
real difference between the "pluggable" and "fileline" allocators.
Currently, our memory allocators are being redirected to the correct
implementation at compile time by simply using macros. In order to make
them swappable at runtime, this commit reshuffles that by instead making
use of a global "git_allocator" structure, whose pointers are set up to
reference the allocator functions. Like this, it becomes easy to swap
out allocators by simply setting these function pointers.

In order to initialize a "git_allocator", our provided allocators
"stdalloc" and "crtdbg" both provide an init function. This is being
called to initialize a passed in allocator struct and set up its members
correctly.

No support is yet included to enable users of libgit2 to switch out the
memory allocator at a global level.
Tie in the newly created infrastructure for swapping out memory
allocators into our settings code. A user can now simply use the new
option "GIT_OPT_SET_ALLOCATOR" with `git_libgit2_opts`, passing in an
already initialized allocator structure as vararg.
@pks-t pks-t force-pushed the pks/memory-allocator branch from 6a4bd1b to 0f6348f Compare June 7, 2018 10:58
@pks-t pks-t changed the title [RFC] Custom memory allocators Custom memory allocators Jun 7, 2018
@pks-t
Copy link
Member Author

pks-t commented Jun 7, 2018

So I'm merging this for now. As you said, in case somebody complains about a possible performance impact, then we can still create a build switch for this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants