-
Notifications
You must be signed in to change notification settings - Fork 2.5k
racy-git, the missing link #3226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9c4ad8d
to
e8ae100
Compare
This comes with yet another merge failure, @ethomson I'm going to leave this in your capable hands. |
I know I've long since given up much right to comment on such things, but pulling the entire typedef of the fs and workdir iterators into |
It did seem ugly, but at the time I was trying to get it to work. Since we do have precedent, I'll look into having |
ffada05
to
80c9ab5
Compare
The history needs some love, but this should behave the same way as git. The check for racily clean entries does a diff for each candidate. It may be faster to collect them all and do a single diff, but having a large amount of racily-clean entries is rather an edge case. |
I'll take a look at that merge test shortly. I'm surprised it's so brittle. In the meantime, we'll probably need a |
I thought we already had it, but it's the status examples which has this define. I think I'd rather have us set the timestamps manually long-term, rather than relying on a second of sleep, though. |
@@ -17,6 +17,10 @@ struct git_blob { | |||
git_odb_object *odb_object; | |||
}; | |||
|
|||
static git_oid empty_blob = {{ 0xe6, 0x9d, 0xe2, 0x9b, 0xb2, 0xd1, 0xd6, 0x43, 0x4b, 0x8b, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to drop this definition in odb.c
and have it use this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did, it's further down the diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah. Thanks.
Tracing a bit what I've changed that test such that we mess with the timestamps so much that we don't detect a change, but I don't believe that this represents a realistic situation anymore. Even if the first change is sensible in order to simulate an implementation which performs a different form of filtering, the second one is stupid, because it's timing-dependent. If we were to see the situation, we would detect it as a difference, because the entry/file/index timestamps would all match, and we need to protect ourselves from someone doing
which makes us have to look at the change. Using nanosecs would remediate this a bit on systems where we can trust it, but that's asking too much, I think, as git on Windows does not use this, and we can't use this on OS X at all IIRC. Which is to say that by hacking the test so much that we pass, it feels like we're mocking the situation so much that we're not really testing anything anymore. |
4c95e31
to
1dd5c55
Compare
1dd5c55
to
a45da80
Compare
I've added extra checks so we avoid doing a diff and smudge of files for which the stat data is different, though I wonder if we have something in diff which would tell us already whether we can detect that a file is different just by looking at the stat. |
Well, we should detect it as a difference if the file was written at the same time as the index. And we shouldn't detect is as a different if the file was not written at the same time. You can certainly envision a case where checkout takes > 1 second, and some files are not eligible for being rescanned due to raciness, and some (those written in the same second as the index) are. The goal of this test is really to ensure that merge treats the files as clean/dirty identically to status, to avoid regressions there. The scenario is that one user has At this point, our status and This test was really to ensure that we never differed in calculating cleanness in merge vs status. It's possible that given the additional complexities that raciness throws into the mix that it is no longer viable to maintain this test and - hey - we probably have it right now! But this is bug was exceptionally costly for us, both in terms of support and in terms of mindshare. (See, eg http://stackoverflow.com/questions/27458236/visual-studio-2013-does-not-offer-to-do-merge-on-git-pull though there are hundreds of questions addressing this topic.) So if it is viable, I would certainly like to keep this test. |
What the test is proving right now is that if the index' timestamp is ahead of the files, we won't look at the contents before deciding that there's no difference. This should be the same as status does, but we still have the problem that this is timing-dependent. If git happens to update the index in some operation after the checkout, then we'd do this, but even if an operation takes over a second, some of the files will have the same timestamp as the index and if we have a different autocrlf setting from the installation which wrote the files, then we'd still need to figure out if it changed just before or after the index, so we'd have to look at the contents, so we might still fail to merge. |
I'm beyond tired of arguing about this. Just delete the test and I'll put some unit tests in VS instead. |
This is looking pretty good to me; how are you feeling about it? |
I think that "index: don't smudge entries with different stat" is completely redundant, as diff does already take this into account, and I'm just re-implementing the logic. That needs to go. |
When a file on the workdir has the same or a newer timestamp than the index, we need to perform a full check of the contents, as the update of the file may have happened just after we wrote the index. The iterator changes are such that we can reach inside the workdir iterator from the diff, though it may be better to have an accessor instead of moving these structs into the header.
When an entry has a racy timestamp, we need to check whether the file itself has changed since we put its entry in the index. Only then do we smudge the size field to force a check the next time around.
c77f3f0
to
905a2f7
Compare
It turns out that that commit wasn't quite redundant, but it's an optimisation for an edge-case which I believe we can solve much better than by having another copy of the comparison functions. I'll open a PR with a test to show a situation in which we don't need to smudge the entry but currently do (as it would be a failing test and we don't really have a way to express that's expected in clar). I just remembered another edge case (smuding a file and then quickly making that file empty, I don't think we handle that) which I'll push up after breakfast. |
As we attempt to replicate a situation in which an older checkout has put a file on disk with different filtering settings from us, set the timestamp on the entry and file to a second before we're performing the operation so the entry in the index counts as old. This way we can test that we're not looking at the on-disk file when the index has the entry and we detect it as clean.
They fit there much better, even though we often check by diffing, it's about the behaviour of the index.
Even though the file is empty and thus the size in the entry matches, we should be able to detect it as a difference.
8d5701f
to
bb4896f
Compare
I'm feeling |
It turns out that the description in racy-git is deceptively simple, and the implementation for the smudging of racily-clean entries goes all the way to performing a diff check which we were not doing.
I'm not too thrilled with the use of
sleep()
but the alternative is changing the timestamp for both the index and file in unison, which we may actually want to do, but it's good enough for PoC.