-
Notifications
You must be signed in to change notification settings - Fork 2.5k
pool: Simplify implementation #3488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -47,7 +47,7 @@ git_commit_list *git_commit_list_insert_by_date(git_commit_list_node *item, git_ | |||
|
|||
git_commit_list_node *git_commit_list_alloc_node(git_revwalk *walk) | |||
{ | |||
return (git_commit_list_node *)git_pool_malloc(&walk->commit_pool, COMMIT_ALLOC); | |||
return (git_commit_list_node *)git_pool_mallocz(&walk->commit_pool, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sooo... We were allocating 4KB for each commit in a walk instead of ~60 bytes. Huuuh... Yeah. I guess I've significantly reduced memory usage for walks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction: We were not. I introduced the over-allocation as part of the refactoring. 😓
I just ran the full test suite with this branch vs the old pool implementation. |
@@ -626,7 +626,7 @@ static int merge_conflict_resolve_one_renamed( | |||
git_oid__cmp(&conflict->our_entry.id, &conflict->their_entry.id) != 0) | |||
return 0; | |||
|
|||
if ((merged = git_pool_malloc(&diff_list->pool, sizeof(git_index_entry))) == NULL) | |||
if ((merged = git_pool_mallocz(&diff_list->pool, sizeof(git_index_entry))) == NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of places where we switch to mallocz
instead of malloc
but promptly write the whole of the allocated area. Seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me track down all of them? I've probably been overzealous here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I have all of them.
OK, I think my previous This makes a lot more sense and feels good to me. 👍👍👍 |
So, here's the deal: the existing pool implementation is terribly slow. For some merge cases, you can see more than 70% of the runtime dominated by pool allocations!
Why is it so slow? Because it does a lot of very complex calculations & pointer wrangling just for the sake of keeping a "free list" and trying to squeeze some bytes out of each allocation. IMO this basically defeats the point of a memory pool/slab allocator.
I've re-implemented the pool to look more like a slab allocator. There's always an existing slab; when the existing slab has not enough space to fulfill the requested allocation, we allocate a new slab.
This implementation is hundreds of times faster than the old pool for most cases. For instance: a particular merge of the
cdnjs
repository that I discussed with @ethomson used to take 17s on my machine. 10 of those seconds were spent in the pool (lol wtf?).With the new implementation, the merge finishes in 5.2s.
Questions
free
API: the iterators code. Removing the call will increase memory usage on the intermediate steps of the iterator, until the pool is fully freed. I think this will be acceptable too.