Thanks to visit codestin.com
Credit goes to github.com

Skip to content

racy-git, the missing link #3226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 24, 2015
Merged

racy-git, the missing link #3226

merged 8 commits into from
Jun 24, 2015

Conversation

carlosmn
Copy link
Member

It turns out that the description in racy-git is deceptively simple, and the implementation for the smudging of racily-clean entries goes all the way to performing a diff check which we were not doing.


I'm not too thrilled with the use of sleep() but the alternative is changing the timestamp for both the index and file in unison, which we may actually want to do, but it's good enough for PoC.

@carlosmn
Copy link
Member Author

This comes with yet another merge failure, @ethomson I'm going to leave this in your capable hands.

@arrbee
Copy link
Member

arrbee commented Jun 17, 2015

I know I've long since given up much right to comment on such things, but pulling the entire typedef of the fs and workdir iterators into iterator.h and then accessing the internal structure just to pull out a single field doesn't seem like the right design to me. There is precedent for accessor APIs that only work with a particular type of iterator (such as git_iterator_current_workdir_path) that would be less invasive, I think.

@carlosmn
Copy link
Member Author

It did seem ugly, but at the time I was trying to get it to work. Since we do have precedent, I'll look into having git_iterator_index() which should also make the diff quite a bit more elegant.

@carlosmn carlosmn force-pushed the cmn/racy-diff-again branch 2 times, most recently from ffada05 to 80c9ab5 Compare June 18, 2015 10:55
@carlosmn carlosmn changed the title diff: check files with the same or newer timestamps [WIP] racy-git, the missing link Jun 18, 2015
@carlosmn
Copy link
Member Author

The history needs some love, but this should behave the same way as git. The check for racily clean entries does a diff for each candidate. It may be faster to collect them all and do a single diff, but having a large amount of racily-clean entries is rather an edge case.

@ethomson
Copy link
Member

I'll take a look at that merge test shortly. I'm surprised it's so brittle.

In the meantime, we'll probably need a p_sleep type of creation on Windows, since it only offers Sleep(millis).

@carlosmn
Copy link
Member Author

I thought we already had it, but it's the status examples which has this define. I think I'd rather have us set the timestamps manually long-term, rather than relying on a second of sleep, though.

@@ -17,6 +17,10 @@ struct git_blob {
git_odb_object *odb_object;
};

static git_oid empty_blob = {{ 0xe6, 0x9d, 0xe2, 0x9b, 0xb2, 0xd1, 0xd6, 0x43, 0x4b, 0x8b,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to drop this definition in odb.c and have it use this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, it's further down the diff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah. Thanks.

@carlosmn
Copy link
Member Author

Tracing a bit what differently_filtered is doing, it seems to be running afoul of the fixed racy protections done here. When we write out the index, we check for racy timestamps, which we in fact set the index to have. In this case, we check whether there is a difference between the contents of what we have in the entry and what we have in the workdir. We do have a difference here, so the entry gets smudged in order to avoid the next status/diff to believe that the file is unchanged.

I've changed that test such that we mess with the timestamps so much that we don't detect a change, but I don't believe that this represents a realistic situation anymore. Even if the first change is sensible in order to simulate an implementation which performs a different form of filtering, the second one is stupid, because it's timing-dependent. If we were to see the situation, we would detect it as a difference, because the entry/file/index timestamps would all match, and we need to protect ourselves from someone doing

git add foo
echo lolchange >foo
git status || git merge somebranch || ...

which makes us have to look at the change.

Using nanosecs would remediate this a bit on systems where we can trust it, but that's asking too much, I think, as git on Windows does not use this, and we can't use this on OS X at all IIRC.

Which is to say that by hacking the test so much that we pass, it feels like we're mocking the situation so much that we're not really testing anything anymore.

@carlosmn carlosmn force-pushed the cmn/racy-diff-again branch from 4c95e31 to 1dd5c55 Compare June 20, 2015 14:27
@carlosmn carlosmn changed the title [WIP] racy-git, the missing link racy-git, the missing link Jun 20, 2015
@carlosmn carlosmn force-pushed the cmn/racy-diff-again branch from 1dd5c55 to a45da80 Compare June 20, 2015 15:24
@carlosmn
Copy link
Member Author

I've added extra checks so we avoid doing a diff and smudge of files for which the stat data is different, though I wonder if we have something in diff which would tell us already whether we can detect that a file is different just by looking at the stat.

@ethomson
Copy link
Member

If we were to see the situation, we would detect it as a difference,

Well, we should detect it as a difference if the file was written at the same time as the index. And we shouldn't detect is as a different if the file was not written at the same time. You can certainly envision a case where checkout takes > 1 second, and some files are not eligible for being rescanned due to raciness, and some (those written in the same second as the index) are.

The goal of this test is really to ensure that merge treats the files as clean/dirty identically to status, to avoid regressions there.

The scenario is that one user has core.autocrlf=false, makes some change on Windows, another user with core.autocrlf=true checks that file out. Now - if that file was not written at the same time as the index, the racy protections don't kick in and the file is clean. (Prior to the raciness fixes, this was always clean.)

At this point, our status and git status will both show this file as clean, but the merge prior to c0b10c2 would show the file as dirty, as it did not correctly take the index into account when looking at the workdir contents.

This test was really to ensure that we never differed in calculating cleanness in merge vs status.

It's possible that given the additional complexities that raciness throws into the mix that it is no longer viable to maintain this test and - hey - we probably have it right now! But this is bug was exceptionally costly for us, both in terms of support and in terms of mindshare. (See, eg http://stackoverflow.com/questions/27458236/visual-studio-2013-does-not-offer-to-do-merge-on-git-pull though there are hundreds of questions addressing this topic.)

So if it is viable, I would certainly like to keep this test.

@carlosmn
Copy link
Member Author

What the test is proving right now is that if the index' timestamp is ahead of the files, we won't look at the contents before deciding that there's no difference. This should be the same as status does, but we still have the problem that this is timing-dependent. If git happens to update the index in some operation after the checkout, then we'd do this, but even if an operation takes over a second, some of the files will have the same timestamp as the index and if we have a different autocrlf setting from the installation which wrote the files, then we'd still need to figure out if it changed just before or after the index, so we'd have to look at the contents, so we might still fail to merge.

@ethomson
Copy link
Member

I'm beyond tired of arguing about this. Just delete the test and I'll put some unit tests in VS instead.

@ethomson
Copy link
Member

This is looking pretty good to me; how are you feeling about it?

@carlosmn
Copy link
Member Author

I think that "index: don't smudge entries with different stat" is completely redundant, as diff does already take this into account, and I'm just re-implementing the logic. That needs to go.

carlosmn added 4 commits June 22, 2015 12:47
When a file on the workdir has the same or a newer timestamp than the
index, we need to perform a full check of the contents, as the update of
the file may have happened just after we wrote the index.

The iterator changes are such that we can reach inside the workdir
iterator from the diff, though it may be better to have an accessor
instead of moving these structs into the header.
When an entry has a racy timestamp, we need to check whether the file
itself has changed since we put its entry in the index. Only then do we
smudge the size field to force a check the next time around.
@carlosmn carlosmn force-pushed the cmn/racy-diff-again branch from c77f3f0 to 905a2f7 Compare June 22, 2015 11:00
@carlosmn
Copy link
Member Author

It turns out that that commit wasn't quite redundant, but it's an optimisation for an edge-case which I believe we can solve much better than by having another copy of the comparison functions. I'll open a PR with a test to show a situation in which we don't need to smudge the entry but currently do (as it would be a failing test and we don't really have a way to express that's expected in clar).

I just remembered another edge case (smuding a file and then quickly making that file empty, I don't think we handle that) which I'll push up after breakfast.

As we attempt to replicate a situation in which an older checkout has
put a file on disk with different filtering settings from us, set the
timestamp on the entry and file to a second before we're performing the
operation so the entry in the index counts as old.

This way we can test that we're not looking at the on-disk file when the
index has the entry and we detect it as clean.
carlosmn added 3 commits June 22, 2015 16:11
They fit there much better, even though we often check by diffing, it's
about the behaviour of the index.
Even though the file is empty and thus the size in the entry matches, we
should be able to detect it as a difference.
@carlosmn carlosmn force-pushed the cmn/racy-diff-again branch from 8d5701f to bb4896f Compare June 22, 2015 14:12
@carlosmn
Copy link
Member Author

I'm feeling :shipit: about it atm, that edge case is handled, and there's a test for it.

ethomson added a commit that referenced this pull request Jun 24, 2015
@ethomson ethomson merged commit bd670ab into master Jun 24, 2015
@carlosmn carlosmn deleted the cmn/racy-diff-again branch June 26, 2015 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants