Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Improve revision walk preparation logic #3921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Oct 7, 2016
Merged

Conversation

carlosmn
Copy link
Member

@carlosmn carlosmn commented Sep 1, 2016

This brings us closer to the code that's in git, makes it more efficient and introduces the slop mechanic in order to make it less likely a complex graph will trip us up.

This is solves the failing tests presented in #3838 in a much more elegant manner than the commits I pushed to that branch and resolves #3916. The gecho-dev walk in question now runs in 1.5s instead of longer than we care to measure.

Some of these tests now set the sorting since our unsorted iteration is now much less sorted than it used to be.


Chances are we're still doing something silly performance-wise like the way we deal with parents in mark_uninteresting() but this produces correct results and solves the immediate performance issue we're facing.

@@ -398,81 +398,191 @@ static int revwalk_next_reverse(git_commit_list_node **object_out, git_revwalk *
return *object_out ? 0 : GIT_ITEROVER;
}


static int interesting(git_pqueue *list)
static int contains(git_pqueue *list, git_commit_list_node *node)
Copy link
Member

@pks-t pks-t Sep 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like git_vector_search, which should in fact be more efficient. So maybe just add #define git_pqueue_search git_vector_search and remove this function altogether?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up removing the whole block since we're not longer using this code.

@pks-t
Copy link
Member

pks-t commented Sep 1, 2016

Mostly minor nits, looks very nice otherwise 👍

@ethomson
Copy link
Member

ethomson commented Sep 2, 2016

I'm seeing some results that differ from git, using the repo in #3916 :

% git rev-list 0dd403224a5acb0702bdbf7ff405067f5d29c239 ^b7083959a30f2137d8a6e27a8489f8729873950c --date-order |head -10             
0dd403224a5acb0702bdbf7ff405067f5d29c239
a2812fa126be538f73efed589e78d6973f23df2f
21ac721516934679f9d6528eba41364bbb7f6f5d
a2da90fae1c4b5fd0cd33ff1a509d8589f8ce695
7f0262e9054aac9f44ee307ea2d1b9a2f2993da3
f09e8fef1a803905ee29457e12cd68a04af256c4
44a196676794033e6dc0a66b890cdf55e7a3c999
247986c342e2cab0f95b35c8f841ac609aa0882d
d07b49ff57344a58c389d31d1c0235c469c215b0
f9242cf7754ab3f64b2f6650f40b24c7020ac61c

And in libgit2, using this simple program:

git_repository_open(&repo, "/Users/ethomson/Temp/gecko-dev");
git_revwalk_new(&revwalk, repo);
git_oid_fromstr(&head_id, "0dd403224a5acb0702bdbf7ff405067f5d29c239");
git_revwalk_push(revwalk, &head_id);
git_oid_fromstr(&base_id, "b7083959a30f2137d8a6e27a8489f8729873950c");
git_revwalk_push(revwalk, &base_id);
git_revwalk_sorting(revwalk, GIT_SORT_TIME);

while (git_revwalk_next(&id, revwalk) == 0) {
    char idstr[GIT_OID_HEXSZ];
    git_oid_fmt(idstr, &id);
    printf("%.*s\n", GIT_OID_HEXSZ, idstr);
}

We get:

% revwalk | head -10
0dd403224a5acb0702bdbf7ff405067f5d29c239
a2812fa126be538f73efed589e78d6973f23df2f
21ac721516934679f9d6528eba41364bbb7f6f5d
a2da90fae1c4b5fd0cd33ff1a509d8589f8ce695
7f0262e9054aac9f44ee307ea2d1b9a2f2993da3
f09e8fef1a803905ee29457e12cd68a04af256c4
44a196676794033e6dc0a66b890cdf55e7a3c999
247986c342e2cab0f95b35c8f841ac609aa0882d
e1f9b3132b193b95d8d70acd0bdc2edc0ac33046
529df92f7d1e13e0ca613af7548509c23d919644

@carlosmn
Copy link
Member Author

I was testing this pair of commits with rev-list --topo-order which does produce the same result as GIT_SORT_TIME | GIT_SORT_TOPOLOGICAL.

Unfortunately it does seem that these combination isn't quite the right one, as TIME | TOPO is what --date-order describes, rather than --topo-order which is what we're getting.

@carlosmn
Copy link
Member Author

As an aside the snipped as-given does not have an equivalent git incantation since --date-order does also imply a topological sort, so the options we get with git would be the equivalent to TOPO or TIME | TOPO.

@carlosmn carlosmn force-pushed the cmn/walk-limit-enough branch from 0e613a7 to b3e1dd1 Compare September 25, 2016 10:41
@carlosmn
Copy link
Member Author

I have ported more git code and now we do agree on --topo-order with GIT_SORT_TOPOGRAPHICAL and --date-order with GIT_SORT_TIME | GIT_SORT_TOPOGRAPHICAL.

With the exception of a single commit, which git shows but we don't. It's probably some edge condition I'm not taking into account, but we're almost there.

@carlosmn carlosmn force-pushed the cmn/walk-limit-enough branch 2 times, most recently from cdd3a3b to 76f4250 Compare September 27, 2016 14:21
@carlosmn
Copy link
Member Author

This should be good to go. We're not quite as fast as git, but fairly close. We're not as careful with memory allocations which is likely part of the reason.

But with this port of the code, we produce the same outputs for the --date-order and --topo-order equivalents.

cl_git_pass(git_oid_fromstr(&old_id, "8e73b769e97678d684b809b163bebdae2911720f"));
cl_git_pass(git_revwalk_hide(_walk, &old_id));

cl_git_pass(git_revwalk_next(&oid, _walk));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you used spaces here instead of tabs.

cl_git_pass(git_oid_fromstr(&old_id, "b91e763008b10db366442469339f90a2b8400d0a"));
cl_git_pass(git_revwalk_hide(_walk, &old_id));

cl_git_pass(git_revwalk_next(&oid, _walk));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also spaces instead of tabs here.

@carlosmn
Copy link
Member Author

carlosmn commented Oct 4, 2016

I've discovered that just passing in REVERSE will in fact not reverse things, but not giving it anything will... so I guess we'll have to fix that before merging.

parent->in_degree++;
}
for (list = commits; list; list = list->next) {
printf("%s: commit %s\n", __func__, git_oid_tostr_s(&list->item->oid));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still some debugging code left here. 😄

@carlosmn carlosmn force-pushed the cmn/walk-limit-enough branch from a16df01 to ee82845 Compare October 5, 2016 13:36
Edward Thomson and others added 10 commits October 6, 2016 11:04
Introduce some tests that show some commits, while hiding some commits
that have a timestamp older than the common ancestors of these two
commits.
We had some home-grown logic to figure out which objects to show during
the revision walk, but it was rather inefficient, looking over the same
list multiple times to figure out when we had run out of interesting
commits. We now use the lists in a smarter way.

We also introduce the slop mechanism to determine when to stpo
looking. When we run out of interesting objects, we continue preparing
the walk for another 5 rounds in order to make it less likely that we
miss objects in situations with complex graphs.
This is a convenience function to reverse the contents of a vector and a pqueue
in-place.

The pqueue function is useful in the case where we're treating it as a
LIFO queue.
In this case, we simply behave like a vector.
After porting over the commit hiding and selection we were still left
with mistmaching output due to the topologial sort.

This ports the topological sorting code to make us match with our
equivalent of `--date-order` and `--topo-order` against the output
from `rev-list`.
This returns the integer-cast truth value comparing the dates. What we
want instead of a (-1, 0, 1) output depending on how they compare.
Change the condition for returning 0 more in line with that we write
elsewhere in the library.
We've now moved to code that's closer to git and produces the output
during the preparation phase, so we no longer process the commits as
part of generating the output.

This makes a chunk of code redundant, as we're simply short-circuiting
it by detecting we've processed the commits alrady.
…t sorting

After `limit_list()` we already have the list in time-sorted order, which is
what we want in the "default" case. Enqueueing into the "unsorted" list would
just reverse it, and the topological sort will do its own sorting if it needs
to.
It changed from implementation-defined to git's default sorting, as there are
systems (e.g. rebase) which depend on this order. Also specify more explicitly
how you can get git's "date-order".
`git-rebase--merge` does not ask for time sorting, but uses the default. We now
produce the same default time-ordered output as git, so make us of that since
it's not always the same output as our time sorting.
@carlosmn carlosmn force-pushed the cmn/walk-limit-enough branch from ee82845 to 3cc5ec9 Compare October 6, 2016 09:05
…ueued

When we read from the list which `limit_list()` gives us, we need to check that
the commit is still interesting, as it might have become uninteresting after it
was added to the list.
@ethomson
Copy link
Member

ethomson commented Oct 6, 2016

Hmm. With the test program above (showing and hiding the same commits) I'm still seeing differences.

GIT_SORT_TIME | GIT_SORT_TOPOLOGICAL (versus --topo-order) gives me the first several commits as being the same, but using only GIT_SORT_TIME (versus --date-order) gives me several differences even in the first few commits.

Worse, using either GIT_SORT_TIME or GIT_SORT_TIME|GIT_SORT_TOPOLOGICAL, we walk 489,028 commits while git walks 8,767.

@carlosmn
Copy link
Member Author

carlosmn commented Oct 7, 2016

The equivalent options are:

  • --topo-order is GIT_SORT_TOPOLOGICAL
  • --date-order is GIT_SORT_TOPOLOGICAL | GIT_SORT_TIME
  • GIT_SORT_TIME is something libgit2 has without any equivalency to anything anywhere at any time

I think you might have forgotten to change that _push int your sample program to a _hide since

% git rev-list --count 0dd403224a5acb0702bdbf7ff405067f5d29c239 b7083959a30f2137d8a6e27a8489f8729873950c
489028

so it definitely looks like you're just listing everything starting from those commits.

@ethomson
Copy link
Member

ethomson commented Oct 7, 2016

I think you might have forgotten to change that _push int your sample program to a _hide ...

Oh, yes, duh, that's exactly my problem. I suspected I was doing something dumb but didn't expect it was that dumb. Alas.

@ethomson ethomson merged commit 45dc219 into master Oct 7, 2016
@carlosmn carlosmn deleted the cmn/walk-limit-enough branch November 15, 2016 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rebase via libgit2 take a long time / forever
4 participants