Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

implausible
Copy link
Contributor

@implausible implausible commented Apr 14, 2017

So we noticed that checkout performance on Windows boxes is particularly slow in comparison to OSX / Linux boxes. Windows checkout operations can run 30x slower than an equivalent checkout operation in a windows box. After running some profiling with Visual studio we tracked the cause of the performance to the method checkout_create_the_new, which internally opens files for writes internally. In fact, the bottle neck ended up being the time windows is taking to talk with the file system. This seems crazy!

So, following the suggestons of @ethomson, we tried performing some checkouts using an EXT4 partition via ext2fsd on windows, and we actually dropped from 30 second checkouts to 10 second checkouts! This made it very clear that NTFS is getting in the way of fast checkouts in Windows.

So, on another suggestion of @ethomson, we dug in and turned checkout_create_the_new into a thread pool. We found that the number of threads per core should be roughly the same - it appears that the slowdown of NTFS is primarily the CPU overhead of compression / security checks and anything else NTFS does for you on every open/write/read call. So we didn't gain anything statistically significant by increasing the number of threads per core.

That said, our metrics show that with this added into libgit2, windows checkouts perform 30-40% faster than the CLI on a 4 core machine. Very significant improvements!

implausible added a commit to nodegit/nodegit that referenced this pull request Apr 20, 2017
Adds libgit2/libgit2#4205 directly to our current branch of libgit.
@implausible
Copy link
Contributor Author

Just wanted to come back and add some more data. I did some perf testing on an 8 core machine, same checkouts. 60-70% perf improvement there. :D

@implausible implausible changed the title Parallelize checkout_create_the_new for ntfs perf gains Parallelize checkout_create_the_new for perf Apr 24, 2017
@implausible
Copy link
Contributor Author

So we were doing some testing here and super duper finally discovered the slowdown on Windows machines. Windows Defender is the culprit. When we disable Windows Defender, performance on Windows machines runs near the same speed as a Linux or OS/X box. Now that said, Defender or an equivalent is often enabled in corporate environments, so this optimization is still a good thing. We still see crazy perf gains for those with active virus protection...

That said, since we've found the real culprit and it's not actually NTFS... We're not just optimizing the Windows pathway anymore, we're really just optimizing checkouts in general. So I'm removing the flag that restricts this behavior for Windows only, because we could still see perf increases in other environments.

@ethomson
Copy link
Member

I'm curious to know what kind of behavior changes you see in non-Windows environments?

@implausible
Copy link
Contributor Author

I just found I needed to share the index mutex over the git_filter_list__load_ext and ditch the filter_mutex. It seems somewhere deep down in git_filter_list__load_ext we end up touching the index, and I managed to get a crash there.

@implausible
Copy link
Contributor Author

For Linux we see about a 10-20% increase in speed from previous versions of libgit2 using the same test branches.

Libgit2 is still slower than the CLI on Linux. Even after threading, CLI checkout performance is outstanding on Linux. After threading we are 100-200ms slower than the CLI, and about 30-60% slower than the CLI, but everything post threading in our tests with a large checkout operation is under 1 second. So there is more room for us to optimize the checkout code, but it would probably be chasing some minor perf gains in the grand scheme of things. Most of those performance tweaks would be more impactful on Windows machines than Linux.

@implausible
Copy link
Contributor Author

I wonder if instead of locking the index on the whole filter list if we could utilize the git_attr_session to pass the index mutex down to where the index is actually being used. @ethomson thoughts?

implausible added a commit to implausible/nodegit that referenced this pull request Jul 19, 2017
@implausible implausible force-pushed the multithread/checkout_create_the_new branch 2 times, most recently from c2ea5f9 to 1f0de11 Compare March 19, 2018 17:59
@implausible
Copy link
Contributor Author

This seems to be failing because of the bitbucket issue in PR #4584

@implausible
Copy link
Contributor Author

We've been running this in GK through nodegit for almost 9 months now without seeing any craziness.

@implausible implausible force-pushed the multithread/checkout_create_the_new branch 2 times, most recently from b48e274 to fe8cdb8 Compare January 15, 2019 19:28
@implausible
Copy link
Contributor Author

implausible commented Jan 15, 2019

Force pushed to fix some conflicts from the git_buf_free -> git_buf_dispose change. Also, we'll need to continue cherrypicking this in NodeGit for the time being.

Due to how long this PR has been open, the history in the branch has been merged into a single commit, as it is much easier to rebase this branch as libgit2 changes.

@ethomson @tiennou @pks-t @carlosmn I know I've been bugging you a lot lately this week, but might I bug you one more and ask if anyone has time for this change 😄

@implausible implausible force-pushed the multithread/checkout_create_the_new branch 3 times, most recently from 2fcfc17 to 779c37b Compare January 16, 2019 16:38
Copy link
Contributor

@tiennou tiennou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a preliminary review. Note that in the process of doing it, I've had to recreate some kind of patchset for clarification purposes (because I was not sure why changes were happening in the first place), and this uncovered some oddities (the temporary filter buffer thing). Hopefully I've left comments where appropriate that should help clarifying those.

  • I'd really like to see some benchmarking numbers for each platform.
  • Having a int parallelism added to git_checkout_options would be also nice. Right now the fact that it's not user-controllable and is enabled by default scares me…
  • especially since there are many places where we might be stepping all over our own threading expectations (ie. you should never share git_repository instances between threads, so I shudder each time I see that checkout_data parameter), and auditing for that is… complicated (as I have no experience with our checkout implementation).

In all, I'm not against the performance benefits, but I don't think it's quite there yet from an implementation standpoint. If there's interest in my "cleaned up" patchset, I can also put it out, and maybe we could switch to it (I've just missed the symlink change, but I should be able to cherry-pick) ?

@tiennou
Copy link
Contributor

tiennou commented Jan 17, 2019

(accesses to git_pool seems to also be a concern, as I don't think those are threadsafe)

@implausible
Copy link
Contributor Author

implausible commented Jan 17, 2019

Thank you for the review @tiennou! I apologize for the lack of commits as a story telling mechanism. Now that this is receiving attention, again, when I am reconciling your review, I will try to rebuild the patches to help make this review more digestable.

@ethomson
Copy link
Member

Thank you for the review @tiennou! I apologize for the lack of commits a story telling mechanism. Now that this is receiving attention, again. When I am reconciling your review, I will try to rebuild the patches to help make this review more digestable.

Another option is to take bits of this PR and merge them opportunistically, as it makes sense. I could imagine bringing in some refactoring, etc. This is sort of what we did for the original merge PR... I think that would be pretty straightforward if that was a direction that you wanted to go instead.

Just throwing that out there; I'm happy either way.

@implausible
Copy link
Contributor Author

implausible commented Jan 17, 2019

So while these numbers are not checkout in isolation, these are some numbers that the GitKraken team took after multithreading checkout in GitKraken.

In these documents, for reference, 2.4 GK refers to checkout without threading and 2.5 GK RC3 refers to checkout operations with threading.

Performance improvements seen on Windows when multithreading checkout

Performance improvements seen on Linux when multithreading checkout

Unfortunately, the repository we used is an old, large, private repo for the Axosoft product line, so I cannot share the repo that helped to create these numbers. You'll have to trust that this particular checkout operation is very large 😄.

I do not have Mac OS numbers at this moment, so it probably would be good to redo these numbers anyway in a more isolated form later.

@tiennou
Copy link
Contributor

tiennou commented Jan 17, 2019

I've put out my patchset here : master...tiennou:feature/threaded-checkout-clean.

It's not quite pretty enough, but I think it makes what's going on more isolated (at least to me 😉).

@implausible
Copy link
Contributor Author

@tiennou I think we need to be careful in wd_item_is_removable to dispose the fullPath regardless of the success or failure of build_target_fullpath in your patch set. Perhaps we should be failing that method of git_buf_set fails as well?

@implausible implausible force-pushed the multithread/checkout_create_the_new branch from 779c37b to ca7241a Compare January 17, 2019 18:05
@implausible
Copy link
Contributor Author

implausible commented Aug 12, 2020

@ethomson @tiennou Ok so I ran into some annoying conflicts early last week with this branch. I decided I would take a stab at cleaning up some of the items that needed work for this PR.

The major change is that we're based on #5602, because it allows us to easily address some of @ethomson's concerns he had expressed to me over performance regressions in the single threaded case due to over allocations of the tmp / target_path buffers.

The other changes are:

  • perfdata now just uses a simple lock before writes
  • completed_steps needed a lock to make it thread safe
  • I've added a separate mutex for the git_filter_list__load_ext, as I am reasonably certain that it is the attribute cache on the repository that makes that method not threadsafe.

I've also tried hard to keep each commit easily digestible. Swapping the temporary buffers to use TLS has immeasurably reduced the footprint of what needs to be reviewed for this PR.

Now on to what I have not implemented yet.. I have not gone through and added this as configurable in an option (whether on checkout_options or in libgit2_opts), my hold up here is on testing. Checkout is such an integral part of many other things you can do in libgit2 that I want anything that could potentially run checkout to run in threaded mode during testing of the library... I was also thinking that whether you thread checkout or not seems like a library option rather than an individual checkout option, because from the standpoint of a user I want to roll out the performance benefits to every checkout that the library does without having to touch every possible endpoint of checkout.

So what I would like to do is just duplicate the offline checkout tests and at least one of the online tests that involves cloning and turn on checkout threading on those runs.. I also think, because of the nature of threading, we should require the offline checkout tests to run 4 times per test cycle to hopefully catch timing errors that could crop up someday.

I would really appreciate it if you guys could get back to me soon on this one, I would like to add the option and get this merged in asap - I would prefer not to deal with another merge conflict from this area.

I don't mean to harp, but we've been running this code for years in GitKraken and for users of NodeGit and haven't seen anything concerning from a stability or correct-output standpoint. I firmly believe that had there been any of those issues we would have seen a report of it by this point with how large of a rollout we've done on this and for how long it's been running in production. To set aside my hubris there, I suppose it's possible that the platforms we target have more favorable conditions for any threading issues though.

@implausible
Copy link
Contributor Author

implausible commented Aug 12, 2020

there are many places where we might be stepping all over our own threading expectations (ie. you should never share git_repository instances between threads, so I shudder each time I see that checkout_data parameter), and auditing for that is… complicated

(accesses to git_pool seems to also be a concern, as I don't think those are threadsafe)

@tiennou I think in general, the reason the pool is working correctly is that we are locking around its use in mkpath2file -> checkout_mkdir. I could alleviate your fears by moving the pool into TLS.

The only other place that I see that is potentially dangerous in an audit (while I'm fresh on this again) is access to the ODB in checkout_write_content where we are performing a lookup of a blob there, as well as freeing it. Should we be locking on blob lookup and free?

Base automatically changed from master to main January 7, 2021 10:09
@implausible implausible dismissed a stale review via ecb25e0 March 29, 2021 18:27
@implausible implausible force-pushed the multithread/checkout_create_the_new branch 5 times, most recently from 6776fc6 to 31f30a4 Compare March 29, 2021 20:21
@implausible
Copy link
Contributor Author

I know I've still got ergonomics to do on this, but seeing the fuzzer agree with this branch makes me feel better about it.

@implausible implausible force-pushed the multithread/checkout_create_the_new branch from 4485279 to 0d97b4c Compare March 29, 2021 22:10
@implausible implausible force-pushed the multithread/checkout_create_the_new branch from 0d97b4c to 56b30cd Compare April 9, 2021 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants