-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Improvements to tree parsing speed #3508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Unfortunately the larger benchmark which also uses diff actually slows down, which I can't explain. This needs some investigation into that before accepting it. |
|
||
struct git_tree_entry { | ||
uint16_t attr; | ||
git_oid oid; | ||
bool pooled; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mis-aligns the struct packing, which is not neat for pool allocations. I think we could very easily change filename_len
to uint16_t
and get a couple extra bytes to play with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does indeed bring sadness to alignment, I forgot to double-check after testing it out. It's probably enough to have a uint16_t
as length for anything we would actually care to support, tbh (and lol if you actually need the 64 bits for your path length).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed to 16-bit and avoided any padding (except for the final one due to the zero-length array). This brings the size down from 32 to 26 bytes. It doesn't seem to make a difference in speed one way or another.
We've already looked at the filename with `memchr()` and then used `strlen()` to allocate the entry. We already know how much we have to advance to get to the object id, so add the filename length instead of looking at each byte again.
These are rather small allocations, so we end up spending a non-trivial amount of time asking the OS for memory. Since these entries are tied to the lifetime of their tree, we can give the tree a pool so we speed up the allocations.
We already know the size due to the `memchr()` so use that information instead of calling `strlen()` on it.
This reduces the size of the struct from 32 to 26 bytes, and leaves a single padding byte at the end of the struct (which comes from the zero-length array).
7a1ec86
to
ee42bb0
Compare
Both the tree-read microbench and the larger diff test are sped up, so I'm happy to merge this as-is. |
Out of curiosity, why did the larger benchmark slow down in the intermediate changes? |
I'm not sure; and I seem to have misplaced the specific benchmark which showed a slowdown. I was probably doing something dumb, as it was still just loading a commit's tree and diffing with the first parent's tree, which I just saw (expectedly) speed up. |
I like the struct re-ordering. 👍 |
{ | ||
git_tree_entry *entry = NULL; | ||
size_t tree_len; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a check here to make sure that filename_len
fits in a uint16_t
? I don't see any obvious vulnerabilities if one were to somehow build a tree that had a longer filename in it, but I'm also not the most creative person in the owrld.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, I've pushed up a commit which does this and de-duplicates the size and overflow checks.
Return an error in case the length is too big. Also take this opportunity to have a single allocating function for the size and overflow logic.
Improvements to tree parsing speed
I'd be interested as to how PR #3527 affects your benchmarks. These changes introduced quite a big memory leak and we did not call |
I'm not sure if we dup tree entries during a diff, but I'll definitely check it out when I'm back on my desktop. When I've fixed other leaks in this area the benchmarks did improve, though. |
Without the leak, Debug goes down to 9.3s; in Release it doesn't seem to make a noticeable difference. |
Here's a couple of simple changes to the way we parse trees which gives us significant improvements.
The first one is just silly, really. We've already calculated how long the filename is, so we can just skip over it instead of looking for the terminator again. This gives us about half the gains.
Then, we can also very easily avoid constantly asking the system for memory by allocating the entries in a pool owned by the tree. The lifetime of the entries is tied to the tree already, so this is a great place to use a pool. Those entries for which we give ownership to the user don't need to change, as we already perform an extra allocation to give them their own lifetime.
I checked the speedup by parsing the top-level tree for
git.git
for ~41k commits. The timing is a bit rough, but the speedup ends up being a bit under 1/3, which is not bad, considering. The most expensive thing right now is parsing the filemode number; and using libc's (presumably) optimised one doesn't really help.I've also tested by grabbing the parent's tree for each commit we walk and diffing it with a pathspec of "README" (I initially tried with a full diff but that takes over two minutes, which is also pretty bad but a different story). The speedup isn't quite as drastic since we're still doing the diff, which I haven't touched here, but still noticeable, especially in release mode.