-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Improve revision walk preparation logic #3921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -398,81 +398,191 @@ static int revwalk_next_reverse(git_commit_list_node **object_out, git_revwalk * | |||
return *object_out ? 0 : GIT_ITEROVER; | |||
} | |||
|
|||
|
|||
static int interesting(git_pqueue *list) | |||
static int contains(git_pqueue *list, git_commit_list_node *node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like git_vector_search
, which should in fact be more efficient. So maybe just add #define git_pqueue_search git_vector_search
and remove this function altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up removing the whole block since we're not longer using this code.
Mostly minor nits, looks very nice otherwise 👍 |
I'm seeing some results that differ from git, using the repo in #3916 :
And in libgit2, using this simple program:
We get:
|
I was testing this pair of commits with Unfortunately it does seem that these combination isn't quite the right one, as |
As an aside the snipped as-given does not have an equivalent git incantation since |
0e613a7
to
b3e1dd1
Compare
I have ported more git code and now we do agree on With the exception of a single commit, which git shows but we don't. It's probably some edge condition I'm not taking into account, but we're almost there. |
cdd3a3b
to
76f4250
Compare
This should be good to go. We're not quite as fast as git, but fairly close. We're not as careful with memory allocations which is likely part of the reason. But with this port of the code, we produce the same outputs for the |
cl_git_pass(git_oid_fromstr(&old_id, "8e73b769e97678d684b809b163bebdae2911720f")); | ||
cl_git_pass(git_revwalk_hide(_walk, &old_id)); | ||
|
||
cl_git_pass(git_revwalk_next(&oid, _walk)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you used spaces here instead of tabs.
cl_git_pass(git_oid_fromstr(&old_id, "b91e763008b10db366442469339f90a2b8400d0a")); | ||
cl_git_pass(git_revwalk_hide(_walk, &old_id)); | ||
|
||
cl_git_pass(git_revwalk_next(&oid, _walk)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also spaces instead of tabs here.
I've discovered that just passing in |
parent->in_degree++; | ||
} | ||
for (list = commits; list; list = list->next) { | ||
printf("%s: commit %s\n", __func__, git_oid_tostr_s(&list->item->oid)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still some debugging code left here. 😄
a16df01
to
ee82845
Compare
Introduce some tests that show some commits, while hiding some commits that have a timestamp older than the common ancestors of these two commits.
We had some home-grown logic to figure out which objects to show during the revision walk, but it was rather inefficient, looking over the same list multiple times to figure out when we had run out of interesting commits. We now use the lists in a smarter way. We also introduce the slop mechanism to determine when to stpo looking. When we run out of interesting objects, we continue preparing the walk for another 5 rounds in order to make it less likely that we miss objects in situations with complex graphs.
This is a convenience function to reverse the contents of a vector and a pqueue in-place. The pqueue function is useful in the case where we're treating it as a LIFO queue.
In this case, we simply behave like a vector.
After porting over the commit hiding and selection we were still left with mistmaching output due to the topologial sort. This ports the topological sorting code to make us match with our equivalent of `--date-order` and `--topo-order` against the output from `rev-list`.
This returns the integer-cast truth value comparing the dates. What we want instead of a (-1, 0, 1) output depending on how they compare.
Change the condition for returning 0 more in line with that we write elsewhere in the library.
We've now moved to code that's closer to git and produces the output during the preparation phase, so we no longer process the commits as part of generating the output. This makes a chunk of code redundant, as we're simply short-circuiting it by detecting we've processed the commits alrady.
…t sorting After `limit_list()` we already have the list in time-sorted order, which is what we want in the "default" case. Enqueueing into the "unsorted" list would just reverse it, and the topological sort will do its own sorting if it needs to.
It changed from implementation-defined to git's default sorting, as there are systems (e.g. rebase) which depend on this order. Also specify more explicitly how you can get git's "date-order".
`git-rebase--merge` does not ask for time sorting, but uses the default. We now produce the same default time-ordered output as git, so make us of that since it's not always the same output as our time sorting.
ee82845
to
3cc5ec9
Compare
…ueued When we read from the list which `limit_list()` gives us, we need to check that the commit is still interesting, as it might have become uninteresting after it was added to the list.
Hmm. With the test program above (showing and hiding the same commits) I'm still seeing differences.
Worse, using either |
The equivalent options are:
I think you might have forgotten to change that
so it definitely looks like you're just listing everything starting from those commits. |
Oh, yes, duh, that's exactly my problem. I suspected I was doing something dumb but didn't expect it was that dumb. Alas. |
This brings us closer to the code that's in git, makes it more efficient and introduces the slop mechanic in order to make it less likely a complex graph will trip us up.
This is solves the failing tests presented in #3838 in a much more elegant manner than the commits I pushed to that branch and resolves #3916. The gecho-dev walk in question now runs in 1.5s instead of longer than we care to measure.
Some of these tests now set the sorting since our unsorted iteration is now much less sorted than it used to be.
Chances are we're still doing something silly performance-wise like the way we deal with parents in
mark_uninteresting()
but this produces correct results and solves the immediate performance issue we're facing.