Thanks to visit codestin.com
Credit goes to github.com

Skip to content

pool: Simplify implementation #3488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 28, 2015
Merged

pool: Simplify implementation #3488

merged 7 commits into from
Oct 28, 2015

Conversation

vmg
Copy link
Member

@vmg vmg commented Oct 27, 2015

So, here's the deal: the existing pool implementation is terribly slow. For some merge cases, you can see more than 70% of the runtime dominated by pool allocations!

Why is it so slow? Because it does a lot of very complex calculations & pointer wrangling just for the sake of keeping a "free list" and trying to squeeze some bytes out of each allocation. IMO this basically defeats the point of a memory pool/slab allocator.

I've re-implemented the pool to look more like a slab allocator. There's always an existing slab; when the existing slab has not enough space to fulfill the requested allocation, we allocate a new slab.

This implementation is hundreds of times faster than the old pool for most cases. For instance: a particular merge of the cdnjs repository that I discussed with @ethomson used to take 17s on my machine. 10 of those seconds were spent in the pool (lol wtf?).

With the new implementation, the merge finishes in 5.2s.

Questions

  • How does the new aggressive slab algorithm impact memory usage? This is hard to guess. For the specific case I'm testing, it doesn't, because all the allocated strings are of similar size. For some pathological cases with very large strings, I could see this wasting a little bit of memory. Frankly, given the terrible performance of the old pool, I'm confident the tradeoff will always be positive.
  • What happens with the APIs that allowed freeing pointers from the pool? They are gone -- it's not possible to free specific pointers from a traditional slab allocator because there's no free-list. Fortunately, only one place of the existing code was calling the pool free API: the iterators code. Removing the call will increase memory usage on the intermediate steps of the iterator, until the pool is fully freed. I think this will be acceptable too.
  • What can possibly go wrong? I am not sure. It's a pretty large change in a critical code path of the library, but the resulting code is so significantly simpler that I'm pretty confident I haven't introduced any regressions. I'd appreciate @carlosmn and @ethomson's careful eyes on this.

@@ -47,7 +47,7 @@ git_commit_list *git_commit_list_insert_by_date(git_commit_list_node *item, git_

git_commit_list_node *git_commit_list_alloc_node(git_revwalk *walk)
{
return (git_commit_list_node *)git_pool_malloc(&walk->commit_pool, COMMIT_ALLOC);
return (git_commit_list_node *)git_pool_mallocz(&walk->commit_pool, 1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sooo... We were allocating 4KB for each commit in a walk instead of ~60 bytes. Huuuh... Yeah. I guess I've significantly reduced memory usage for walks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: We were not. I introduced the over-allocation as part of the refactoring. 😓

@vmg
Copy link
Member Author

vmg commented Oct 28, 2015

I just ran the full test suite with this branch vs the old pool implementation. jemalloc reports 461984 total bytes allocated by the old vs 472696 allocated with the new implementation: that's 10KB difference between the two branches. Considering that the test suite seems to suffer from a lot of repetitive patterns, I think these are pretty good results.

@@ -626,7 +626,7 @@ static int merge_conflict_resolve_one_renamed(
git_oid__cmp(&conflict->our_entry.id, &conflict->their_entry.id) != 0)
return 0;

if ((merged = git_pool_malloc(&diff_list->pool, sizeof(git_index_entry))) == NULL)
if ((merged = git_pool_mallocz(&diff_list->pool, sizeof(git_index_entry))) == NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of places where we switch to mallocz instead of malloc but promptly write the whole of the allocated area. Seems unnecessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me track down all of them? I've probably been overzealous here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I have all of them.

@vmg
Copy link
Member Author

vmg commented Oct 28, 2015

OK, I think my previous jemalloc metrics where bogus. I've manually added stat accounting and we're allocating 88mb of pool pages in the old code vs 94mb in the new code.

This makes a lot more sense and feels good to me. 👍👍👍

vmg pushed a commit that referenced this pull request Oct 28, 2015
pool: Simplify implementation
@vmg vmg merged commit 232a7e3 into master Oct 28, 2015
@vmg vmg mentioned this pull request Oct 28, 2015
@ethomson ethomson deleted the vmg/pool branch January 9, 2019 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants