Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Concurrency fixes for the reference db #3561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Nov 14, 2016
Merged

Concurrency fixes for the reference db #3561

merged 13 commits into from
Nov 14, 2016

Conversation

carlosmn
Copy link
Member

There's all sorts of races in there if you run many threads which all want to create and compress references.

We start by removing a useless test which doesn't even follow our own rules for concurrency and fixing a second one to use different objects and actually perform a concurrency test.

A bunch of this is simply to bubble up error codes so we know what we're dealing with, or fixing how we report them.

This also makes the packing logic more robust and safer by ignoring transient errors (and really the non-transient ones but it's more important that we continue working) and only deleting references if they haven't changed since we packed.

We still need to make sure #1534 doesn't happen by locking the packed-refs file before reloading, but the test no longer fails every second run.

@carlosmn carlosmn changed the title [WIP] Concurrency fixes for the reference db Concurrency fixes for the reference db Mar 10, 2016
@carlosmn
Copy link
Member Author

This finally works fine on my Debian. I don't know why AppVeyor doesn't like it. I've tried it with VS2015 and it works fine for me. turns out I was testing the wrong branch, I can repro.

@stanhu
Copy link
Contributor

stanhu commented Aug 27, 2016

Thanks for working on this fix. I can reproduce #1534 quite easily by running git gc continuously on a networked fileystem, pushing updates to a branch, and checking the SHA of that branch with Rugged. At times, it appears the HEAD goes "back in time" momentarily; it's also possible to cause the branch to disappear momentarily as well. More details here: https://gitlab.com/gitlab-org/gitlab-ce/issues/15392#note_14530450

The current workaround seems to be to initialize a new Rugged::Repository anytime you need to look up the latest SHA of a commit. This does not seem ideal, so I'm curious if there's anything the community can do to help move this PR forward.

if (git_path_exists(full_path.ptr) && p_unlink(full_path.ptr) < 0) {
if (failed)
continue;
/* We need to stopy anybody from updating the ref while we try to do a safe delete */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopy -> stop :)


if (!git_reference_lookup(&ref, g_repo, name)) {
cl_git_pass(git_reference_delete(ref));
cl_git_pass(error);
git_reference_free(ref);
}

if (i == 5) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be nice to have as a constant.

git_reference *ref;
char name[128];
git_repository *repo;

cl_git_pass(git_repository_open(&repo, data->path));

for (i = 0; i < 10; ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be nice to have as a constant.

do {
error = git_reference_name_to_id(&head, repo, "HEAD");
} while (error == GIT_ELOCKED);
cl_git_pass(error);

for (i = 0; i < 10; ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be nice to have as a constant. (I realize this was already here but it might be a nice cleanup.)

do {
error = git_reference_create(&ref[i], repo, name, &head, 0, NULL);
} while (error == GIT_ELOCKED);
cl_git_pass(error);

if (i == 5) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be nice to have as a constant.

We say it's going to work if you use a different repository in each
thread. Let's do precisely that in our code instead of hoping re-using
the refdb is going to work.

This test does fail currently, surfacing existing bugs.
We can get useful information like GIT_ELOCKED out of this instead of
just -1.
In order not to undo concurrent modifications to references, we must
make sure that we only delete a loose reference if it still has the same
value as when we packed it.

This means we need to lock it and then compare the value with the one we
put in the packed file.
We need to save the errno, lest we clobber it in the giterr_set()
call. Also add code for reporting that a path component is missing,
which is a distinct failure mode.
There might be a few threads or processes working with references
concurrently, so fortify the code to ignore errors which come from
concurrent access which do not stop us from continuing the work.

This includes ignoring an unlinking error. Either someone else removed
it or we leave the file around. In the former case the job is done, and
in the latter case, the ref is still in a valid state.
We can reduce the duplication by cleaning up at the beginning of the
loop, since it's something we want to do every time we continue.
This allows the caller to know the errors was e.g. due to the
packed-refs file being already locked and they can try again later.
The logic simply consists of retrying for as long as the library says
the data is locked, but it eventually gets through.
Checking the size before we open the file descriptor can lead to the
file being replaced from under us when renames aren't quite atomic, so
we can end up reading too little of the file, leading to us thinking the
file is corrupted.
It does not help us to check whether the file exists before trying to
unlink it since it might be gone by the time unlink is called.

Instead try to remove it and handle the resulting error if it did not
exist.
At times we may try to delete a reference which a different thread has
already taken care of.
On Windows we can find locked files even when reading a reference or the
packed-refs file. Bubble up the error in this case as well to allow
callers on Windows to retry more intelligently.
@ethomson ethomson merged commit 904e1e7 into master Nov 14, 2016
@carlosmn carlosmn deleted the cmn/refdb-para branch November 15, 2016 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants