racy-git, the missing link #3226

carlosmn · 2015-06-17T12:38:57Z

It turns out that the description in racy-git is deceptively simple, and the implementation for the smudging of racily-clean entries goes all the way to performing a diff check which we were not doing.

I'm not too thrilled with the use of sleep() but the alternative is changing the timestamp for both the index and file in unison, which we may actually want to do, but it's good enough for PoC.

carlosmn · 2015-06-17T12:54:29Z

This comes with yet another merge failure, @ethomson I'm going to leave this in your capable hands.

arrbee · 2015-06-17T16:19:51Z

I know I've long since given up much right to comment on such things, but pulling the entire typedef of the fs and workdir iterators into iterator.h and then accessing the internal structure just to pull out a single field doesn't seem like the right design to me. There is precedent for accessor APIs that only work with a particular type of iterator (such as git_iterator_current_workdir_path) that would be less invasive, I think.

carlosmn · 2015-06-17T16:24:06Z

It did seem ugly, but at the time I was trying to get it to work. Since we do have precedent, I'll look into having git_iterator_index() which should also make the diff quite a bit more elegant.

carlosmn · 2015-06-18T13:56:06Z

The history needs some love, but this should behave the same way as git. The check for racily clean entries does a diff for each candidate. It may be faster to collect them all and do a single diff, but having a large amount of racily-clean entries is rather an edge case.

ethomson · 2015-06-18T14:14:30Z

I'll take a look at that merge test shortly. I'm surprised it's so brittle.

In the meantime, we'll probably need a p_sleep type of creation on Windows, since it only offers Sleep(millis).

carlosmn · 2015-06-18T15:59:08Z

I thought we already had it, but it's the status examples which has this define. I think I'd rather have us set the timestamps manually long-term, rather than relying on a second of sleep, though.

ethomson · 2015-06-18T17:08:05Z

src/blob.h

@@ -17,6 +17,10 @@ struct git_blob {
 	git_odb_object *odb_object;
 };

+static git_oid empty_blob = {{ 0xe6, 0x9d, 0xe2, 0x9b, 0xb2, 0xd1, 0xd6, 0x43, 0x4b, 0x8b,


Might be nice to drop this definition in odb.c and have it use this one.

I did, it's further down the diff.

Hah. Thanks.

carlosmn · 2015-06-20T11:23:12Z

Tracing a bit what differently_filtered is doing, it seems to be running afoul of the fixed racy protections done here. When we write out the index, we check for racy timestamps, which we in fact set the index to have. In this case, we check whether there is a difference between the contents of what we have in the entry and what we have in the workdir. We do have a difference here, so the entry gets smudged in order to avoid the next status/diff to believe that the file is unchanged.

I've changed that test such that we mess with the timestamps so much that we don't detect a change, but I don't believe that this represents a realistic situation anymore. Even if the first change is sensible in order to simulate an implementation which performs a different form of filtering, the second one is stupid, because it's timing-dependent. If we were to see the situation, we would detect it as a difference, because the entry/file/index timestamps would all match, and we need to protect ourselves from someone doing

git add foo
echo lolchange >foo
git status || git merge somebranch || ...

which makes us have to look at the change.

Using nanosecs would remediate this a bit on systems where we can trust it, but that's asking too much, I think, as git on Windows does not use this, and we can't use this on OS X at all IIRC.

Which is to say that by hacking the test so much that we pass, it feels like we're mocking the situation so much that we're not really testing anything anymore.

carlosmn · 2015-06-20T15:28:30Z

I've added extra checks so we avoid doing a diff and smudge of files for which the stat data is different, though I wonder if we have something in diff which would tell us already whether we can detect that a file is different just by looking at the stat.

ethomson · 2015-06-20T18:29:11Z

If we were to see the situation, we would detect it as a difference,

Well, we should detect it as a difference if the file was written at the same time as the index. And we shouldn't detect is as a different if the file was not written at the same time. You can certainly envision a case where checkout takes > 1 second, and some files are not eligible for being rescanned due to raciness, and some (those written in the same second as the index) are.

The goal of this test is really to ensure that merge treats the files as clean/dirty identically to status, to avoid regressions there.

The scenario is that one user has core.autocrlf=false, makes some change on Windows, another user with core.autocrlf=true checks that file out. Now - if that file was not written at the same time as the index, the racy protections don't kick in and the file is clean. (Prior to the raciness fixes, this was always clean.)

At this point, our status and git status will both show this file as clean, but the merge prior to c0b10c2 would show the file as dirty, as it did not correctly take the index into account when looking at the workdir contents.

This test was really to ensure that we never differed in calculating cleanness in merge vs status.

It's possible that given the additional complexities that raciness throws into the mix that it is no longer viable to maintain this test and - hey - we probably have it right now! But this is bug was exceptionally costly for us, both in terms of support and in terms of mindshare. (See, eg http://stackoverflow.com/questions/27458236/visual-studio-2013-does-not-offer-to-do-merge-on-git-pull though there are hundreds of questions addressing this topic.)

So if it is viable, I would certainly like to keep this test.

carlosmn · 2015-06-20T20:36:15Z

What the test is proving right now is that if the index' timestamp is ahead of the files, we won't look at the contents before deciding that there's no difference. This should be the same as status does, but we still have the problem that this is timing-dependent. If git happens to update the index in some operation after the checkout, then we'd do this, but even if an operation takes over a second, some of the files will have the same timestamp as the index and if we have a different autocrlf setting from the installation which wrote the files, then we'd still need to figure out if it changed just before or after the index, so we'd have to look at the contents, so we might still fail to merge.

ethomson · 2015-06-20T20:46:52Z

I'm beyond tired of arguing about this. Just delete the test and I'll put some unit tests in VS instead.

ethomson · 2015-06-22T02:54:58Z

This is looking pretty good to me; how are you feeling about it?

carlosmn · 2015-06-22T10:00:38Z

I think that "index: don't smudge entries with different stat" is completely redundant, as diff does already take this into account, and I'm just re-implementing the logic. That needs to go.

When a file on the workdir has the same or a newer timestamp than the index, we need to perform a full check of the contents, as the update of the file may have happened just after we wrote the index. The iterator changes are such that we can reach inside the workdir iterator from the diff, though it may be better to have an accessor instead of moving these structs into the header.

When an entry has a racy timestamp, we need to check whether the file itself has changed since we put its entry in the index. Only then do we smudge the size field to force a check the next time around.

carlosmn · 2015-06-22T11:05:51Z

It turns out that that commit wasn't quite redundant, but it's an optimisation for an edge-case which I believe we can solve much better than by having another copy of the comparison functions. I'll open a PR with a test to show a situation in which we don't need to smudge the entry but currently do (as it would be a failing test and we don't really have a way to express that's expected in clar).

I just remembered another edge case (smuding a file and then quickly making that file empty, I don't think we handle that) which I'll push up after breakfast.

As we attempt to replicate a situation in which an older checkout has put a file on disk with different filtering settings from us, set the timestamp on the entry and file to a second before we're performing the operation so the entry in the index counts as old. This way we can test that we're not looking at the on-disk file when the index has the entry and we detect it as clean.

They fit there much better, even though we often check by diffing, it's about the behaviour of the index.

Even though the file is empty and thus the size in the entry matches, we should be able to detect it as a difference.

carlosmn · 2015-06-22T16:04:23Z

I'm feeling about it atm, that edge case is handled, and there's a test for it.

racy-git, the missing link

carlosmn mentioned this pull request Jun 17, 2015

repo.Index.Stage() won't notice a change if it occurs within the same second libgit2/libgit2sharp#688

Merged

carlosmn force-pushed the cmn/racy-diff-again branch from 9c4ad8d to e8ae100 Compare June 17, 2015 12:54

carlosmn force-pushed the cmn/racy-diff-again branch 2 times, most recently from ffada05 to 80c9ab5 Compare June 18, 2015 10:55

carlosmn changed the title ~~diff: check files with the same or newer timestamps~~ [WIP] racy-git, the missing link Jun 18, 2015

ethomson reviewed Jun 18, 2015
View reviewed changes

carlosmn force-pushed the cmn/racy-diff-again branch from 4c95e31 to 1dd5c55 Compare June 20, 2015 14:27

carlosmn changed the title ~~[WIP] racy-git, the missing link~~ racy-git, the missing link Jun 20, 2015

carlosmn force-pushed the cmn/racy-diff-again branch from 1dd5c55 to a45da80 Compare June 20, 2015 15:24

ethomson mentioned this pull request Jun 20, 2015

Dont update index unnecessarily #3234

Merged

carlosmn added 4 commits June 22, 2015 12:47

index: check racily clean entries more thoroughly

7497584

When an entry has a racy timestamp, we need to check whether the file itself has changed since we put its entry in the index. Only then do we smudge the size field to force a check the next time around.

tests: plug leaks in the racy test

6c5eaea

tests: set racy times manually

26432a9

carlosmn force-pushed the cmn/racy-diff-again branch from c77f3f0 to 905a2f7 Compare June 22, 2015 11:00

carlosmn added 3 commits June 22, 2015 16:11

tests: move racy tests to the index

27133ca

They fit there much better, even though we often check by diffing, it's about the behaviour of the index.

index: add a diff test for smudging a file which becomes empty

6e611f7

Even though the file is empty and thus the size in the entry matches, we should be able to detect it as a difference.

Add a note about racy-git in CHANGELOG

bb4896f

carlosmn force-pushed the cmn/racy-diff-again branch from 8d5701f to bb4896f Compare June 22, 2015 14:12

ethomson added a commit that referenced this pull request Jun 24, 2015

Merge pull request #3226 from libgit2/cmn/racy-diff-again

bd670ab

racy-git, the missing link

ethomson merged commit bd670ab into master Jun 24, 2015

carlosmn mentioned this pull request Jun 24, 2015

Regression from zero out racily-clean entries #3231

Closed

carlosmn deleted the cmn/racy-diff-again branch June 26, 2015 16:17

racy-git, the missing link #3226

racy-git, the missing link #3226

Uh oh!

Conversation

carlosmn commented Jun 17, 2015

Uh oh!

carlosmn commented Jun 17, 2015

Uh oh!

arrbee commented Jun 17, 2015

Uh oh!

carlosmn commented Jun 17, 2015

Uh oh!

carlosmn commented Jun 18, 2015

Uh oh!

ethomson commented Jun 18, 2015

Uh oh!

carlosmn commented Jun 18, 2015

Uh oh!

ethomson Jun 18, 2015

Choose a reason for hiding this comment

Uh oh!

carlosmn Jun 18, 2015

Choose a reason for hiding this comment

Uh oh!

ethomson Jun 18, 2015

Choose a reason for hiding this comment

Uh oh!

carlosmn commented Jun 20, 2015

Uh oh!

carlosmn commented Jun 20, 2015

Uh oh!

ethomson commented Jun 20, 2015

Uh oh!

carlosmn commented Jun 20, 2015

Uh oh!

ethomson commented Jun 20, 2015

Uh oh!

ethomson commented Jun 22, 2015

Uh oh!

carlosmn commented Jun 22, 2015

Uh oh!

carlosmn commented Jun 22, 2015

Uh oh!

carlosmn commented Jun 22, 2015

Uh oh!

Uh oh!