Parse `git rev-list`-style options #1393

gnprice · 2013-03-06T09:09:09Z

With this series, we support (parts of) the interface for specifying revisions that Git users are familiar with from git rev-list, git log, and other Git commands. This is useful for creating out-of-core command-line programs that browse a Git repo (like tig), and may be useful for an advanced search interface in GUI or web applications.

In this version, we parse all the options we can support with the existing logic in revwalk: basic include/exclude commits, and the ordering flags. More logic will be required to support --grep, --author, the pickaxe -S, etc.

Also included is a simple driver program that can be used like git rev-list.

vmg · 2013-03-06T11:45:27Z

Hey! Thank you so much for this PR!

I'm afraid that the proposed git_revwalk_parseopts function is outside of the scope of the library. You see, adding commandline flags parsing to this specific API is a slippery slope: libgit2's purpose has never been to reproduce Core Git's CLI functionality, as this one is a tangled mess of poorly designed UI choices. We've always stayed at a higher level (pure C APIs) because CLI flags are a very specific implementation detail, only usable if your goal with the library is to create commandline apps that intend to mimic Core Git's original interface -- which, in all fairness, is probably not a good idea.

In this case, the flags for rev-list are pretty straightforward, but we don't want and don't plan to bring more CLI flag parsing, specially not if it involves "little gems" like git reset and git checkout (the commandline equivalent of C++ function overloading, something we are not particularly fond of).

However, I'd hate this code to go to waste. If you could move git_revwalk_parseopts as a static function in your examples/rev-list.c, that would make a for a delightful example on how to parse commandline flags the Git way and convert them to C calls.

scunz · 2013-03-06T12:01:33Z

@vmg - what about adding the lines 262ff as a new function git_revparse_range? That would be pretty nice to have. Users are familiar with the way to specify ranges in git.

vmg · 2013-03-06T15:47:50Z

@scunz: I was under the impression that the .. and ... was already implemented on revparse_range. Isn't that the whole point of it? @ben is this the case?

scunz · 2013-03-06T15:50:38Z

find . -type f -name "*.h" | xargs grep revparse in the libgit2 source tree gives me only the git_revparse_single did I miss something or do I expect it at the wrong place?

vmg · 2013-03-06T16:00:53Z

Oh, you meant "as a new function". Sorry, I misread your post. Yes, I certainly agree -- I thought that was on @ben's timeline, but somehow it hasn't happened yet.

ben · 2013-03-06T18:53:45Z

Yeah, it was always my plan to do a git_revparse_range, but it hasn't happened yet. @gnprice, would you mind transmogrifying this PR into just adding that? It looks like you've got a pretty good start.

gnprice · 2013-03-06T19:50:53Z

Thanks for the quick replies!

I 100% agree that such "little gems" as reset and checkout have no place in libgit2. They're poorly thought out (or more historically, nobody ever thought them out as wholes), they're confusing, and they don't provide anything that can't be just as well done with much cleaner abstractions.

I think the rev-list command-line API is in quite a different category. The combination of --author, --grep, -S, -G (usually one or two at a time) with a range or set of ranges and an optional set of files makes a powerful query interface, and the various options interact cleanly, just being ANDed together to filter the results. I use this query interface daily as a Git user, and I miss it when I use some out-of-tree tools like tig, which is a handy Git repository browser (try it! http://jonas.nitro.dk/tig/) but doesn't properly support --author and some others. Sometimes I wish GitHub supported it :-) -- I end up cloning a repo just so I can examine its history effectively.

And it'd be sad for tig and other tools to each end up implementing the rev-list/log query interface separately; besides the duplication of work, that'd inevitably cause little incompatibilities, which are annoying as a user.

There isn't any other part of Git's CLI that I would similarly want to see in other tools; it's all either blandly functional or a mess like reset and checkout. Maybe the --pretty=format: formats, but I use those maybe 2% as often as the rev-list/log query interface and they're kind of a mess anyway.

I like the idea of git_revparse_range. I'll refactor this series to introduce that first.

gnprice · 2013-03-07T09:24:04Z

Revised series pushed. I think by git_revparse_range we all meant something more like git_revwalk_push_range, because if the former name is taken literally I'm not sure how it would work (what kind of output would it return or store?). So that's what I wrote, but if someone really did mean git_revparse_range, please comment.

ben · 2013-03-07T19:42:39Z

You're right, the revwalk API is a much better place for a range operator to live. A git_revparse_range would probably be returning something like a revwalk anyways. 😀

I'm still 👎 on including the argument-parsing stuff in the library, though. The revwalk API already has sorting and ordering options, so there's no need for a DSL to describe them. I'd recommend moving that stuff into your sample code; this will help people figure out how to map what they know from git into revwalk calls.

vmg · 2013-03-07T19:44:46Z

What @ben said. git_revparse_range looks great, but parseopts needs to go in the example file.

Looking good, otherwise!

carlosmn · 2013-03-08T19:45:17Z

While it's common to want to use a range for the revision walker, the range syntax is also used for diff, and a user may want to record the operation elsewhere (or print it in a different fomat). How would a tool that wants to print what walk it's doing get this information if it's all done directly inside the revwalk?

ben · 2013-03-09T01:42:58Z

Doesn't a diff tool just need two trees to compare? In that case, you'd look up the two commits using git_revparse_single, retrieve the trees, and diff them.

And the proposed API just modifies an existing revwalk. You can do anything you want to with the commits inside the callback; this would be sufficient to write something like git log. Am I missing something?

gnprice · 2013-03-09T01:59:10Z

git diff accepts range syntax, but git diff A..B is just a synonym
for git diff A B, and the more obscure git diff A...B is a synonym
git diff $(git merge-base A B) B.

If a libgit2-based program wants to support those syntaxes, it should
be easy to look for .. and hand the endpoints to
git_revparse_single. I'm not sure what value a git_revparse_range
could return to make that job any simpler. Unless I'm missing
something deeper, there isn't really anything conceptually in common
between what A..B means as a range and what it means to git diff.

carlosmn · 2013-03-09T02:00:27Z

Diff does need two tree-ishes to compare, but you may not get them as two different arguments. If we're going to parse the same as git, git diff A..B is the same as to git diff A B, but the program would only get one argument. Would git_revparse_single be able to handle this somehow?

I don't quite get what callback you're referring to. Letting the revwalk machinery handle the figuring out of positive and negative commits for you would let you implement log, but you wouldn't have any idea what you're showing.

carlosmn · 2013-03-09T02:10:47Z

there isn't really anything conceptually in common
between what A..B means as a range and what it means to git diff.

We're parsing the same thing. There's a left side, which can be the preimage or the negative commit, the right side, which can be postimage or positive commit, and whether we're talking about the merge-base.

Log and diff do different things, but the format is the same. I don't see why it should be so deeply ingrained in the revwalk code when we need to parse the exact same syntax elsewhere. Should we have two pieces of code doing the same? Why is this preferable to letting the library user know what the user input and have it call the appropriate functions?

gnprice · 2013-03-09T02:31:09Z

@carlosmn What API would you propose for this common parser? I still don't see what kind of output it would produce.

We're parsing the same thing. There's a left side, which can be the preimage or the negative commit, the right side, which can be postimage or positive commit, and whether we're talking about the merge-base.

See, this isn't even true. Compare, from the Git documentation:

The following two commands are equivalent:

-----------------------------------------------------------------------
    $ git rev-list A B --not $(git merge-base --all A B)
    $ git rev-list A...B
-----------------------------------------------------------------------

vs.

"git diff A\...B" is equivalent to "git diff $(git-merge-base A B) B"

So in the rev-list case, the left-hand side of a three-dot range is actually a positive commit, not negative; it's symmetric with the right-hand side. The merge-base is involved in both cases, but with a rather different effect. These syntaxes take the same form, but they really are quite different things.

carlosmn · 2013-03-09T03:06:46Z

@carlosmn What API would you propose for this common parser? I still
don't see what kind of output it would produce.

It would give you the three pieces of information. The left side, the
right side and whether three dots were used.

         We're parsing the same thing. There's a left side, which can
         be the preimage or the negative commit, the right side, which
         can be postimage or positive commit, and whether we're talking
         about the merge-base.


See, this isn't even true. Compare, from the Git documentation:

Right, when you're dealing with the merge base, the meanings change a
bit, so you adjust. This is semantics, which change depending on what your goal is.

This doesn't change that we're dealing with the same syntax. You can use the information however you want afterwards.

gnprice · 2013-03-09T03:47:48Z

It would give you the three pieces of information. The left side, the right side and whether three dots were used.

So it would have three output arguments, or an output argument that is a struct that exists for this single purpose. I am not convinced this really saves much complexity for the caller.

Do you plan to submit an implementation of the git diff command-line syntax that would use this git_revparse_range? My sense of the discussion above is that others would disagree with that proposal. Certainly I don't think it's in the same category as the rev-list query interface as something that non-git.git tools are likely to want or to have difficulty implementing exactly for themselves if they do.

If you think you have a use case coming up that would take advantage of this parser, I can add it. If not, I think it makes sense to say YAGNI, and if someone eventually (a) adds this particular dark corner of git diff syntax and (b) thinks factoring the parser out is helpful, then we can see how it looks with the code in front of us.

scunz · 2013-03-09T04:00:58Z

As i am writing a graphical git client, i indeed see various places where i might take advantage of a range-of-commits parser in the way @carlosmn described it here.

Also i see no reason in duplicating such code.

Just to add a further usecase to the above: git also allows to cherry-pick a range.

ben · 2013-03-09T05:45:53Z

Let me see if I'm thinking about this correctly:

If you want git diff a..b behavior, you don't actually want a range, since you're really just diffing the a and b trees. We can either provide a git_revparse_range_endpoints api, or let the caller find the endpoints.
If you want git rev-list a..b behavior, you want a revwalk with appropriate starting and ending points.
If you want git rev-list a...b behavior, you want a revwalk with appropriate starting and ending points, and some extra context.
If you want to cherry-pick a range of commits, you want to apply the commits in increasing-date order. A revwalk can help you do this.

What I'm getting out of this is that a revwalk is a really useful thing, and being able to set the boundaries of one from a rev-list spec is also a useful thing. The ability to specify the two sides of a diff using a syntax that's intended to generate a range of things seems quite a bit less useful.

Let's not get too bogged down with this. There doesn't seem to be much disagreement that git_revwalk_push_range is desirable. What we are disagreeing on is whether we should also allow use of a similar syntax in different contexts. Or am I totally off base?

Also, I'd still like to see the arg-parsing stuff moved into the sample code. Libgit2 isn't only (or even primarily) for command-line tools, and parsing command-line options is outside the scope.

carlosmn · 2013-03-09T13:26:50Z

If you think you have a use case coming up that would take advantage of this parser, I can add it. If not, I think it makes sense to say YAGNI,

You've already identified an use-case: letting the user pass ranges. Diff is the example of a different part of git that parses the same syntax.

There doesn't seem to be much disagreement that git_revwalk_push_range is desirable. What we are disagreeing on is whether we should also allow use of a similar syntax in different contexts. Or am I totally off base?

It's definitely something useful, but why implement it as a black box inside the revision walker? What does a program do when it wants to know what it's asking the library to do?

ben · 2013-03-09T18:18:25Z

You've already identified an use-case: letting the user pass ranges. Diff is the example of a different part of git that parses the same syntax.

Here are the two use cases I see:

Walking a range of commits.
Grabbing the endpoints of a range of commits.

Are there any others that we should design for?

scunz · 2013-03-09T18:31:15Z

There are lots of places where the git cli expects commit ranges - and I'm not sure in which of those 2 categories they can be assigned. However, after this discussion, I think most of them can probably be reduced to feeding a revwalker.
I don't see a second use case for the diff-style giving of parameters.

carlosmn · 2013-03-09T20:16:14Z

Those are the main use-cases right now if we want to emulate git and those are good starting points. You'll typically end up setting up a revwalker based on that, but it doesn't mean returning a revwalk is a good idea, as you've no idea what's happening at that point or which of the commits the user asked you to show.

ben · 2013-03-11T19:39:45Z

@gnprice, if you're still paying attention, I have a way to get this accepted. Mostly it involves me getting my way. 😄

Let's take the command-line parsing stuff and move it into the sample. The addition to the revwalk API is fantastic, though, I'm looking forward to having it in the API.

scunz · 2013-03-11T20:29:00Z

Let's take the command-line parsing stuff and move it into the sample. The addition to the revwalk API is fantastic, though, I'm looking forward to having it in the API.

Full Ack from me 👍

gnprice · 2013-03-12T08:45:10Z

@gnprice, if you're still paying attention, I have a way to get this
accepted. Mostly it involves me getting my way. 😄

Let's take the command-line parsing stuff and move it into the
sample.

Yep, that's what I concluded from the discussion so far. I plan to do it. It may be a few more days, because there's a deadline at work.

Also because I want to experiment with how to have tests for the examples/ files, so that my tests here don't go to waste. (Do any of them have tests somewhere I'm missing?) Though now that I look a bit at it, maybe that deserves its own pull request. What repository is examples/general.c meant to run against as a demo? It fails for me with

*Raw Object Read*
Error -3 finding object in repository - Object not found - failed to find pack entry (fd6e612585290339ea8bf39c692a7ff6a29cb7c3)

and I can't find that object ID in either libgit2.git or any of the test repos in tests-clar/resources. Seems like it'd be good to have a test to make sure the example all still runs.

ben · 2013-03-12T14:51:53Z

I like the way you think. 😄 You're not blind, the examples actually don't have any tests.

I'm actually not sure what repo the general example was originally meant to be run against. Its function right now is as a readable code snippet. Feel free to update it to work against one of the test repositories if you like.

Signed-off-by: Greg Price <[email protected]>

The purported command output was already inaccurate, as the refs aren't where it shows. In any event, the labels a reader of this file really needs are the indices used in commit_sorting_*, to make it possible to understand them by referring directly from those arrays to the diagram rather than from the index arrays, to commit_ids, to the diagram. Add those. Signed-off-by: Greg Price <[email protected]>

Signed-off-by: Greg Price <[email protected]>

gnprice · 2013-03-31T22:50:46Z

New series pushed. This moves the parsing of the query syntax out into examples/, as requested. It also pulls out a git_revparse_rangelike for parsing the range-like syntax used by git rev-list and git diff.

The last commit adds a test script for the rev-list example program. This is the first test for anything in examples/, so it's pretty direct, without much of a framework.

vmg · 2013-03-31T23:40:47Z

Yeah, this is looking very good. Somehow tests are not passing -- can you force a repush and see what happens?

My only concern is the .sh test for the example. As a rule of thumb, we don't ship tests that cannot run under all platforms.

ben · 2013-04-01T03:34:43Z

I'm with @vmg on the platform stuff, but the examples are problematic at best right now. The general example won't even compile with MSVC; it declares variables in the "wrong" places.

I'm tempted to ignore that for now, and leave Windows compatibility for a future PR.

ben · 2013-04-01T03:35:29Z

include/git2/revparse.h

+ *
+ * @param left the left-hand commit
+ * @param right the right-hand commit
+ * @param threedots 0 if the endpoints are separated by two dots, 1 if by three


Can you make some indication that these three are output parameters? Right now it's not clear unless you check the code.

Thanks, good catch. Done.

ben · 2013-04-01T03:48:52Z

@gnprice, thanks for coming back to this. I like what you're doing here. 😃

Signed-off-by: Greg Price <[email protected]>

All the hard work is already in revparse. Signed-off-by: Greg Price <[email protected]>

This demonstrates parts of the interface for specifying revisions that Git users are familiar with from 'git rev-list', 'git log', and other Git commands. A similar query interface is used in out-of-core command-line programs that browse a Git repo (like 'tig'), and may be useful for an 'advanced search' interface in GUI or web applications. In this version, we parse all the query modifiers we can support with the existing logic in revwalk: basic include/exclude commits, and the ordering flags. More logic will be required to support '--grep', '--author', the pickaxe '-S', etc. Signed-off-by: Greg Price <[email protected]>

This test file could probably be improved by a framework like the one in git.git:t/, or by using a language like Python instead of shell. The other examples would benefit from tests too. Probably best to settle on a framework to write them in, then add more tests. Signed-off-by: Greg Price <[email protected]>

gnprice · 2013-04-07T04:01:27Z

Pushed a new version responding to @ben's comments, and the Travis build failed in the same spot -- I'll see if I can reproduce locally and debug.

I'm all for platform-independence as a criterion for a good long-term test framework for examples/. For now, I think the test is much better than no test, and I agree with Ben that it'd be better to merge as is and make the test better (and test the other examples!) in future work.

gnprice · 2013-04-07T04:23:43Z

Yeah, I can't reproduce. Following exactly the sequence of commands in the failed Travis log (well, I added a --reference to optimize the clone command), all tests pass.

To my surprise, I'm even running the same OS as Travis -- 32-bit Ubuntu Precise. So I don't know what the difference is.

I see the Travis log includes installing valgrind. Is it possible to get more information from valgrind about the segfault inside Travis? Alternatively, @ben or @vmg, are you able to reproduce the failure locally?

vmg · 2013-04-07T05:24:41Z

Yeah, I can definitely reproduce. You were missing a couple NULLs in the test_rangelike helper. I fixed the issue and merged manually.

Thank you again for the PR! Looking forward to more stuff for you. :)

vmg · 2013-04-07T05:26:11Z

Fix is 812e5ae btw.

gnprice · 2013-04-07T07:29:54Z

I see, the issue is that git_object_free() got called on uninitialized values in the failure cases. Thanks for the fix!

gnprice added 2 commits March 31, 2013 15:33

Fix puzzling doc comment

804c5f5

Signed-off-by: Greg Price <[email protected]>

revwalk: refactor tests a bit

2932c88

Signed-off-by: Greg Price <[email protected]>

ben reviewed Apr 1, 2013
View reviewed changes

gnprice added 4 commits April 6, 2013 20:51

revparse: Parse range-like syntax

b208d90

Signed-off-by: Greg Price <[email protected]>

revwalk: Parse revision ranges

af079d8

All the hard work is already in revparse. Signed-off-by: Greg Price <[email protected]>

vmg merged commit 2e23328 into libgit2:development Apr 7, 2013

gnprice deleted the revwalk branch April 7, 2013 07:30

ben mentioned this pull request Apr 9, 2013

Unified revparse #1459

Merged

ben mentioned this pull request Sep 2, 2013

Gitteh needs a primary maintainer! libgit2/node-gitteh#68

Open

Parse git rev-list-style options #1393

Parse git rev-list-style options #1393

Uh oh!

Conversation

gnprice commented Mar 6, 2013

Uh oh!

vmg commented Mar 6, 2013

Uh oh!

scunz commented Mar 6, 2013

Uh oh!

vmg commented Mar 6, 2013

Uh oh!

scunz commented Mar 6, 2013

Uh oh!

vmg commented Mar 6, 2013

Uh oh!

ben commented Mar 6, 2013

Uh oh!

gnprice commented Mar 6, 2013

Uh oh!

gnprice commented Mar 7, 2013

Uh oh!

ben commented Mar 7, 2013

Uh oh!

vmg commented Mar 7, 2013

Uh oh!

carlosmn commented Mar 8, 2013

Uh oh!

ben commented Mar 9, 2013

Uh oh!

gnprice commented Mar 9, 2013

Uh oh!

carlosmn commented Mar 9, 2013

Uh oh!

carlosmn commented Mar 9, 2013

Uh oh!

gnprice commented Mar 9, 2013

Uh oh!

carlosmn commented Mar 9, 2013

Uh oh!

gnprice commented Mar 9, 2013

Uh oh!

scunz commented Mar 9, 2013

Uh oh!

ben commented Mar 9, 2013

Uh oh!

carlosmn commented Mar 9, 2013

Uh oh!

ben commented Mar 9, 2013

Uh oh!

scunz commented Mar 9, 2013

Uh oh!

carlosmn commented Mar 9, 2013

Uh oh!

ben commented Mar 11, 2013

Uh oh!

scunz commented Mar 11, 2013

Uh oh!

gnprice commented Mar 12, 2013

Uh oh!

ben commented Mar 12, 2013

Uh oh!

gnprice commented Mar 31, 2013

Uh oh!

vmg commented Mar 31, 2013

Uh oh!

ben commented Apr 1, 2013

Uh oh!

ben Apr 1, 2013

Choose a reason for hiding this comment

Uh oh!

gnprice Apr 7, 2013

Choose a reason for hiding this comment

Uh oh!

ben commented Apr 1, 2013

Uh oh!

gnprice commented Apr 7, 2013

Uh oh!

gnprice commented Apr 7, 2013

Uh oh!

Parse `git rev-list`-style options #1393

Parse `git rev-list`-style options #1393