-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Concurrency fixes for the reference db #3561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
da14d28
to
df071c6
Compare
df071c6
to
421edf5
Compare
This finally works fine on my Debian. |
5badb3a
to
60ffe78
Compare
Thanks for working on this fix. I can reproduce #1534 quite easily by running The current workaround seems to be to initialize a new |
if (git_path_exists(full_path.ptr) && p_unlink(full_path.ptr) < 0) { | ||
if (failed) | ||
continue; | ||
/* We need to stopy anybody from updating the ref while we try to do a safe delete */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stopy -> stop :)
|
||
if (!git_reference_lookup(&ref, g_repo, name)) { | ||
cl_git_pass(git_reference_delete(ref)); | ||
cl_git_pass(error); | ||
git_reference_free(ref); | ||
} | ||
|
||
if (i == 5) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be nice to have as a constant.
git_reference *ref; | ||
char name[128]; | ||
git_repository *repo; | ||
|
||
cl_git_pass(git_repository_open(&repo, data->path)); | ||
|
||
for (i = 0; i < 10; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be nice to have as a constant.
do { | ||
error = git_reference_name_to_id(&head, repo, "HEAD"); | ||
} while (error == GIT_ELOCKED); | ||
cl_git_pass(error); | ||
|
||
for (i = 0; i < 10; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be nice to have as a constant. (I realize this was already here but it might be a nice cleanup.)
do { | ||
error = git_reference_create(&ref[i], repo, name, &head, 0, NULL); | ||
} while (error == GIT_ELOCKED); | ||
cl_git_pass(error); | ||
|
||
if (i == 5) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be nice to have as a constant.
We say it's going to work if you use a different repository in each thread. Let's do precisely that in our code instead of hoping re-using the refdb is going to work. This test does fail currently, surfacing existing bugs.
We can get useful information like GIT_ELOCKED out of this instead of just -1.
In order not to undo concurrent modifications to references, we must make sure that we only delete a loose reference if it still has the same value as when we packed it. This means we need to lock it and then compare the value with the one we put in the packed file.
We need to save the errno, lest we clobber it in the giterr_set() call. Also add code for reporting that a path component is missing, which is a distinct failure mode.
There might be a few threads or processes working with references concurrently, so fortify the code to ignore errors which come from concurrent access which do not stop us from continuing the work. This includes ignoring an unlinking error. Either someone else removed it or we leave the file around. In the former case the job is done, and in the latter case, the ref is still in a valid state.
We can reduce the duplication by cleaning up at the beginning of the loop, since it's something we want to do every time we continue.
This allows the caller to know the errors was e.g. due to the packed-refs file being already locked and they can try again later.
The logic simply consists of retrying for as long as the library says the data is locked, but it eventually gets through.
Checking the size before we open the file descriptor can lead to the file being replaced from under us when renames aren't quite atomic, so we can end up reading too little of the file, leading to us thinking the file is corrupted.
It does not help us to check whether the file exists before trying to unlink it since it might be gone by the time unlink is called. Instead try to remove it and handle the resulting error if it did not exist.
At times we may try to delete a reference which a different thread has already taken care of.
On Windows we can find locked files even when reading a reference or the packed-refs file. Bubble up the error in this case as well to allow callers on Windows to retry more intelligently.
60ffe78
to
aef54a4
Compare
There's all sorts of races in there if you run many threads which all want to create and compress references.
We start by removing a useless test which doesn't even follow our own rules for concurrency and fixing a second one to use different objects and actually perform a concurrency test.
A bunch of this is simply to bubble up error codes so we know what we're dealing with, or fixing how we report them.
This also makes the packing logic more robust and safer by ignoring transient errors (and really the non-transient ones but it's more important that we continue working) and only deleting references if they haven't changed since we packed.
We still need to make sure #1534 doesn't happen by locking the
packed-refs
file before reloading, but the test no longer fails every second run.