-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Parallelize checkout_create_the_new for perf #4205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Parallelize checkout_create_the_new for perf #4205
Conversation
Adds libgit2/libgit2#4205 directly to our current branch of libgit.
Just wanted to come back and add some more data. I did some perf testing on an 8 core machine, same checkouts. 60-70% perf improvement there. :D |
So we were doing some testing here and super duper finally discovered the slowdown on Windows machines. Windows Defender is the culprit. When we disable Windows Defender, performance on Windows machines runs near the same speed as a Linux or OS/X box. Now that said, Defender or an equivalent is often enabled in corporate environments, so this optimization is still a good thing. We still see crazy perf gains for those with active virus protection... That said, since we've found the real culprit and it's not actually NTFS... We're not just optimizing the Windows pathway anymore, we're really just optimizing checkouts in general. So I'm removing the flag that restricts this behavior for Windows only, because we could still see perf increases in other environments. |
I'm curious to know what kind of behavior changes you see in non-Windows environments? |
I just found I needed to share the index mutex over the |
For Linux we see about a 10-20% increase in speed from previous versions of libgit2 using the same test branches. Libgit2 is still slower than the CLI on Linux. Even after threading, CLI checkout performance is outstanding on Linux. After threading we are 100-200ms slower than the CLI, and about 30-60% slower than the CLI, but everything post threading in our tests with a large checkout operation is under 1 second. So there is more room for us to optimize the checkout code, but it would probably be chasing some minor perf gains in the grand scheme of things. Most of those performance tweaks would be more impactful on Windows machines than Linux. |
I wonder if instead of locking the index on the whole filter list if we could utilize the |
Also stack along performance PRs libgit2/libgit2#4311 and libgit2/libgit2#4205
c2ea5f9
to
1f0de11
Compare
This seems to be failing because of the bitbucket issue in PR #4584 |
We've been running this in GK through nodegit for almost 9 months now without seeing any craziness. |
b48e274
to
fe8cdb8
Compare
Force pushed to fix some conflicts from the git_buf_free -> git_buf_dispose change. Also, we'll need to continue cherrypicking this in NodeGit for the time being. Due to how long this PR has been open, the history in the branch has been merged into a single commit, as it is much easier to rebase this branch as libgit2 changes. @ethomson @tiennou @pks-t @carlosmn I know I've been bugging you a lot lately this week, but might I bug you one more and ask if anyone has time for this change 😄 |
2fcfc17
to
779c37b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a preliminary review. Note that in the process of doing it, I've had to recreate some kind of patchset for clarification purposes (because I was not sure why changes were happening in the first place), and this uncovered some oddities (the temporary filter buffer thing). Hopefully I've left comments where appropriate that should help clarifying those.
- I'd really like to see some benchmarking numbers for each platform.
- Having a
int parallelism
added togit_checkout_options
would be also nice. Right now the fact that it's not user-controllable and is enabled by default scares me… - especially since there are many places where we might be stepping all over our own threading expectations (ie. you should never share
git_repository
instances between threads, so I shudder each time I see thatcheckout_data
parameter), and auditing for that is… complicated (as I have no experience with our checkout implementation).
In all, I'm not against the performance benefits, but I don't think it's quite there yet from an implementation standpoint. If there's interest in my "cleaned up" patchset, I can also put it out, and maybe we could switch to it (I've just missed the symlink change, but I should be able to cherry-pick) ?
(accesses to |
Thank you for the review @tiennou! I apologize for the lack of commits as a story telling mechanism. Now that this is receiving attention, again, when I am reconciling your review, I will try to rebuild the patches to help make this review more digestable. |
Another option is to take bits of this PR and merge them opportunistically, as it makes sense. I could imagine bringing in some refactoring, etc. This is sort of what we did for the original merge PR... I think that would be pretty straightforward if that was a direction that you wanted to go instead. Just throwing that out there; I'm happy either way. |
So while these numbers are not checkout in isolation, these are some numbers that the GitKraken team took after multithreading checkout in GitKraken. In these documents, for reference, Performance improvements seen on Windows when multithreading checkout Performance improvements seen on Linux when multithreading checkout Unfortunately, the repository we used is an old, large, private repo for the Axosoft product line, so I cannot share the repo that helped to create these numbers. You'll have to trust that this particular checkout operation is very large 😄. I do not have Mac OS numbers at this moment, so it probably would be good to redo these numbers anyway in a more isolated form later. |
I've put out my patchset here : master...tiennou:feature/threaded-checkout-clean. It's not quite pretty enough, but I think it makes what's going on more isolated (at least to me 😉). |
@tiennou I think we need to be careful in |
779c37b
to
ca7241a
Compare
6b1e831
to
9878350
Compare
Includes unmerged PRs: - libgit2/libgit2#5384 - libgit2/libgit2#5347 - libgit2/libgit2#4205
Includes unmerged PRs: - libgit2/libgit2#5384 - libgit2/libgit2#5347 - libgit2/libgit2#4205
9878350
to
4e58546
Compare
@ethomson @tiennou Ok so I ran into some annoying conflicts early last week with this branch. I decided I would take a stab at cleaning up some of the items that needed work for this PR. The major change is that we're based on #5602, because it allows us to easily address some of @ethomson's concerns he had expressed to me over performance regressions in the single threaded case due to over allocations of the tmp / target_path buffers. The other changes are:
I've also tried hard to keep each commit easily digestible. Swapping the temporary buffers to use TLS has immeasurably reduced the footprint of what needs to be reviewed for this PR. Now on to what I have not implemented yet.. I have not gone through and added this as configurable in an option (whether on checkout_options or in libgit2_opts), my hold up here is on testing. Checkout is such an integral part of many other things you can do in libgit2 that I want anything that could potentially run checkout to run in threaded mode during testing of the library... I was also thinking that whether you thread checkout or not seems like a library option rather than an individual checkout option, because from the standpoint of a user I want to roll out the performance benefits to every checkout that the library does without having to touch every possible endpoint of checkout. So what I would like to do is just duplicate the offline checkout tests and at least one of the online tests that involves cloning and turn on checkout threading on those runs.. I also think, because of the nature of threading, we should require the offline checkout tests to run 4 times per test cycle to hopefully catch timing errors that could crop up someday. I would really appreciate it if you guys could get back to me soon on this one, I would like to add the option and get this merged in asap - I would prefer not to deal with another merge conflict from this area. I don't mean to harp, but we've been running this code for years in GitKraken and for users of NodeGit and haven't seen anything concerning from a stability or correct-output standpoint. I firmly believe that had there been any of those issues we would have seen a report of it by this point with how large of a rollout we've done on this and for how long it's been running in production. To set aside my hubris there, I suppose it's possible that the platforms we target have more favorable conditions for any threading issues though. |
@tiennou I think in general, the reason the pool is working correctly is that we are locking around its use in The only other place that I see that is potentially dangerous in an audit (while I'm fresh on this again) is access to the ODB in |
6776fc6
to
31f30a4
Compare
I know I've still got ergonomics to do on this, but seeing the fuzzer agree with this branch makes me feel better about it. |
4485279
to
0d97b4c
Compare
0d97b4c
to
56b30cd
Compare
So we noticed that checkout performance on Windows boxes is particularly slow in comparison to OSX / Linux boxes. Windows checkout operations can run 30x slower than an equivalent checkout operation in a windows box. After running some profiling with Visual studio we tracked the cause of the performance to the method
checkout_create_the_new
, which internally opens files for writes internally. In fact, the bottle neck ended up being the time windows is taking to talk with the file system. This seems crazy!So, following the suggestons of @ethomson, we tried performing some checkouts using an EXT4 partition via ext2fsd on windows, and we actually dropped from 30 second checkouts to 10 second checkouts! This made it very clear that NTFS is getting in the way of fast checkouts in Windows.
So, on another suggestion of @ethomson, we dug in and turned checkout_create_the_new into a thread pool. We found that the number of threads per core should be roughly the same - it appears that the slowdown of NTFS is primarily the CPU overhead of compression / security checks and anything else NTFS does for you on every open/write/read call. So we didn't gain anything statistically significant by increasing the number of threads per core.
That said, our metrics show that with this added into libgit2, windows checkouts perform 30-40% faster than the CLI on a 4 core machine. Very significant improvements!