-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Submodule optimization #4016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submodule optimization #4016
Conversation
So, I know it's kind of audacious for me to request a merge with both CI systems failing, but honestly I think neither of these failures are my fault. The same AppVeyor test failures appear on a few other recent pull requests: I also think the segfault on Travis is not my fault. The tests all pass for me locally, and give no warnings under valgrind's memcheck. There are, however, helgrind errors. Here are a few other relatively recent cases of segfaults (on AppVeyor rather than Travis, but if the cause is what I think it is, it could appear on any platform). I can repro the segfault under a version of libgit2 immediately prior to my changes -- simply running the test suite a few dozen times is sufficient. The segfault is in attr_cache.free from git_repository.cleanup, which is one of the areas that helgrind flags as hinky. Sadly, the bug never seems to reproduce itself under valgrind (or even gdb), but since helgrind is complaining, it seems probably threading-related. Helgrind does not, afaict, complain about our new code. |
5d3de99
to
beeaf92
Compare
I just did a no-op amend and now Travis passes. AppVeyor is still bad, but as I noted, it frequently fails this way. |
beeaf92
to
89d0deb
Compare
Actually, after talking to some colleagues, I decided to rename the functions. No substantive changes. |
89d0deb
to
deac747
Compare
BTW this fixes #3756 |
I mentioned in slack that this is probably not in scope for this release, which we're just finishing up. There's not a lot of documentation around how one might want to use this - when would I, as a consumer, need to clear the cache? Would I then benefit from repopulating it? I think that @carlosmn asked in #3756 - why can this not be a thing that the library handles for you? Why would I, as a consumer, have better knowledge of when caching should happen than the library itself? |
@ethomson I can think of a couple of ways the library could handle it for you.
|
Additionally, we could probably separate into a different pull request the performance fixes that don't involve a cache (e.g., s.t. |
Looking at the automatic caching idea:
In the absence of the cache, each User code (e.g our git-meta, via nodegit) might know that it's going to open a repository once, do a thing, and then close it. So it's OK for that code to turn on caching as soon as it opens the module. But the Atom editor (for instance) might not be wiling to make that assumption; maybe at some point you go in and checkout a new branch from the command-line. (I don't actually know if Atom uses this code -- it's just an example). And at that point, the cache needs invalidation.
|
My conclusion from the above, btw, is that automatic caching doesn't make sense and that we ought to just merge this as-is. |
Thanks for the writeup @novalis . I think that makes sense. I have some review comments to make but I think the overall strategy makes sense. |
@@ -1566,6 +1617,33 @@ int git_submodule_location(unsigned int *location, git_submodule *sm) | |||
location, NULL, NULL, NULL, sm, GIT_SUBMODULE_IGNORE_ALL); | |||
} | |||
|
|||
int git_repository_submodule_cache_all(git_repository *repo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't put git_repository
functions in a file that is not repository.c
. Either: 1) these should either go in repository.c
, or 2) they should be git_submodule_cache_all
and git_submodule_cache_clear
. The latter is obviously more brief, but I think (1) makes more sense.
* | ||
* @param repo the repository whose submodules will be cached. | ||
*/ | ||
GIT_EXTERN(int) git_repository_submodule_cache_all( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no indication of when somebody should call cache_clear
. You need to describe what operations would cause a cache coherency problem; it's not at all obvious to the reader and I suspect it's wildly dangerous to call cache_all
and then (say) check out a new branch that would update a submodule? I'm just guessing, I don't know, and I'd like this to expand upon that.
I also think that you should move this into include/git2/sys/repository.h
. This seems to me to be a cache that you have to manage rather carefully, and the sys/
directory is for the things that will hurt you badly if you get wrong.
static void free_submodule_names(git_strmap *names) | ||
{ | ||
git_buf *name = 0; | ||
if (0 == names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't reverse these tests. I know that many people have gotten into a habit of this and find it a good practice, but please follow our existing code style.
Please don't use 0
for pointers.
This should be if (names == NULL)
.
@@ -196,6 +209,17 @@ int git_submodule_lookup( | |||
|
|||
assert(repo && name); | |||
|
|||
if (0 != repo->submodule_cache) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (repo->submodule_cache != NULL)
, please.
{ | ||
git_submodule *sm; | ||
assert(repo); | ||
if (0 == repo->submodule_cache) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (repo->submodule_cache == NULL)
...
git_repository_submodule_cache_all(g_repo); | ||
cl_assert(0 == git_submodule_lookup(&sm, g_repo, "sm_unchanged")); | ||
cl_assert(0 == git_submodule_lookup(&sm2, g_repo, "sm_unchanged")); | ||
cl_assert(sm == sm2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using cl_assert
is not ideal here, since it just will say "an assertion failed". We have a variety of helper functions that provide more explicit messages, which are helpful in debugging and understanding test failures:
For git functions, please use cl_git_pass
, which will show the error messages properly when a function fails and returns nonzero. (eg, cl_git_pass(git_submodule_lookup...)
)
To compare two pointers, please use cl_assert_equal_p
which will show more information in a nice format about the actual and expected values.
|
||
/* and that we get new objects again after clearing the cache. */ | ||
git_repository_submodule_cache_clear(g_repo); | ||
cl_assert(0 == git_submodule_lookup(&sm2, g_repo, "sm_unchanged")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, too, this should be cl_git_pass(...)
.
I have some comments. Note that I haven't looked at the submodule logic to know if that was just changed whitespace or if there are substantive changes, too. I'll have to do that before we merge, unless you revert it. (Mixing whitespace changes with actual code changes is very hard to follow in many diff tools.) |
I have a separate commit for the whitespace changes. Are you seeing other whitespace changes outside that commit? |
deac747
to
1f5c2d0
Compare
Yep, thanks, that's fine then; I was looking at this in the GitHub web view and didn't realize that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, after giving this a final look, I noticed two more very minor issues. I hope you don't mind fixing these real quick so that I can merge it; apologies that I didn't catch these the first time around. Thanks!
/* refresh the HEAD OID */ | ||
if (submodule_update_head(sm) < 0) | ||
return -1; | ||
if (0 == sm->repo->submodule_cache) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also reverse this test?
@@ -416,7 +458,7 @@ typedef struct { | |||
git_repository *repo; | |||
} lfc_data; | |||
|
|||
static int all_submodules(git_repository *repo, git_strmap *map) | |||
int all_submodules(git_repository *repo, git_strmap *map) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this is public (within the library anyway), this should have a name that indicates it's internal to the library. Something like git_submodule__all
or git_submodule__map
, whichever makes more sense to you.
1f5c2d0
to
4e9aca8
Compare
Signed-off-by: David Turner <[email protected]>
Added `git_repository_submodule_cache_all` to initialze a cache of submodules on the repository so that operations looking up N submodules are O(N) and not O(N^2). Added a `git_repository_submodule_cache_clear` function to remove the cache. Also optimized the function that loads all submodules as it was itself O(N^2) w.r.t the number of submodules, having to loop through the `.gitmodules` file once per submodule. I changed it to process the `.gitmodules` file once, into a map. Signed-off-by: David Turner <[email protected]>
`git_submodule_status` is very slow, bottlenecked on `git_repository_head_tree`, which it uses through `submodule_update_head`. If the user has requested submodule caching, assume that they want this status cached too and skip it. Signed-off-by: David Turner <[email protected]>
4e9aca8
to
673dff8
Compare
OK, I think I've addressed all comments; there are no merge conflicts; the tests are green. |
@ethomson: it was no trouble at all; thanks for all your help! |
The bulk of this work was done by @bpeabody. I just made some minor adjustments. This fixes #3756.