Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Parse git rev-list-style options #1393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 7, 2013
Merged

Parse git rev-list-style options #1393

merged 7 commits into from
Apr 7, 2013

Conversation

gnprice
Copy link
Contributor

@gnprice gnprice commented Mar 6, 2013

With this series, we support (parts of) the interface for specifying revisions that Git users are familiar with from git rev-list, git log, and other Git commands. This is useful for creating out-of-core command-line programs that browse a Git repo (like tig), and may be useful for an advanced search interface in GUI or web applications.

In this version, we parse all the options we can support with the existing logic in revwalk: basic include/exclude commits, and the ordering flags. More logic will be required to support --grep, --author, the pickaxe -S, etc.

Also included is a simple driver program that can be used like git rev-list.

@vmg
Copy link
Member

vmg commented Mar 6, 2013

Hey! Thank you so much for this PR!

I'm afraid that the proposed git_revwalk_parseopts function is outside of the scope of the library. You see, adding commandline flags parsing to this specific API is a slippery slope: libgit2's purpose has never been to reproduce Core Git's CLI functionality, as this one is a tangled mess of poorly designed UI choices. We've always stayed at a higher level (pure C APIs) because CLI flags are a very specific implementation detail, only usable if your goal with the library is to create commandline apps that intend to mimic Core Git's original interface -- which, in all fairness, is probably not a good idea.

In this case, the flags for rev-list are pretty straightforward, but we don't want and don't plan to bring more CLI flag parsing, specially not if it involves "little gems" like git reset and git checkout (the commandline equivalent of C++ function overloading, something we are not particularly fond of).

However, I'd hate this code to go to waste. If you could move git_revwalk_parseopts as a static function in your examples/rev-list.c, that would make a for a delightful example on how to parse commandline flags the Git way and convert them to C calls.

@scunz
Copy link
Contributor

scunz commented Mar 6, 2013

@vmg - what about adding the lines 262ff as a new function git_revparse_range? That would be pretty nice to have. Users are familiar with the way to specify ranges in git.

@vmg
Copy link
Member

vmg commented Mar 6, 2013

@scunz: I was under the impression that the .. and ... was already implemented on revparse_range. Isn't that the whole point of it? @ben is this the case?

@scunz
Copy link
Contributor

scunz commented Mar 6, 2013

find . -type f -name "*.h" | xargs grep revparse in the libgit2 source tree gives me only the git_revparse_single did I miss something or do I expect it at the wrong place?

@vmg
Copy link
Member

vmg commented Mar 6, 2013

Oh, you meant "as a new function". Sorry, I misread your post. Yes, I certainly agree -- I thought that was on @ben's timeline, but somehow it hasn't happened yet.

@ben
Copy link
Member

ben commented Mar 6, 2013

Yeah, it was always my plan to do a git_revparse_range, but it hasn't happened yet. @gnprice, would you mind transmogrifying this PR into just adding that? It looks like you've got a pretty good start.

@gnprice
Copy link
Contributor Author

gnprice commented Mar 6, 2013

Thanks for the quick replies!

I 100% agree that such "little gems" as reset and checkout have no place in libgit2. They're poorly thought out (or more historically, nobody ever thought them out as wholes), they're confusing, and they don't provide anything that can't be just as well done with much cleaner abstractions.

I think the rev-list command-line API is in quite a different category. The combination of --author, --grep, -S, -G (usually one or two at a time) with a range or set of ranges and an optional set of files makes a powerful query interface, and the various options interact cleanly, just being ANDed together to filter the results. I use this query interface daily as a Git user, and I miss it when I use some out-of-tree tools like tig, which is a handy Git repository browser (try it! http://jonas.nitro.dk/tig/) but doesn't properly support --author and some others. Sometimes I wish GitHub supported it :-) -- I end up cloning a repo just so I can examine its history effectively.

And it'd be sad for tig and other tools to each end up implementing the rev-list/log query interface separately; besides the duplication of work, that'd inevitably cause little incompatibilities, which are annoying as a user.

There isn't any other part of Git's CLI that I would similarly want to see in other tools; it's all either blandly functional or a mess like reset and checkout. Maybe the --pretty=format: formats, but I use those maybe 2% as often as the rev-list/log query interface and they're kind of a mess anyway.

I like the idea of git_revparse_range. I'll refactor this series to introduce that first.

@gnprice
Copy link
Contributor Author

gnprice commented Mar 7, 2013

Revised series pushed. I think by git_revparse_range we all meant something more like git_revwalk_push_range, because if the former name is taken literally I'm not sure how it would work (what kind of output would it return or store?). So that's what I wrote, but if someone really did mean git_revparse_range, please comment.

@ben
Copy link
Member

ben commented Mar 7, 2013

You're right, the revwalk API is a much better place for a range operator to live. A git_revparse_range would probably be returning something like a revwalk anyways. 😀

I'm still 👎 on including the argument-parsing stuff in the library, though. The revwalk API already has sorting and ordering options, so there's no need for a DSL to describe them. I'd recommend moving that stuff into your sample code; this will help people figure out how to map what they know from git into revwalk calls.

@vmg
Copy link
Member

vmg commented Mar 7, 2013

What @ben said. git_revparse_range looks great, but parseopts needs to go in the example file.

Looking good, otherwise!

@carlosmn
Copy link
Member

carlosmn commented Mar 8, 2013

While it's common to want to use a range for the revision walker, the range syntax is also used for diff, and a user may want to record the operation elsewhere (or print it in a different fomat). How would a tool that wants to print what walk it's doing get this information if it's all done directly inside the revwalk?

@ben
Copy link
Member

ben commented Mar 9, 2013

Doesn't a diff tool just need two trees to compare? In that case, you'd look up the two commits using git_revparse_single, retrieve the trees, and diff them.

And the proposed API just modifies an existing revwalk. You can do anything you want to with the commits inside the callback; this would be sufficient to write something like git log. Am I missing something?

@gnprice
Copy link
Contributor Author

gnprice commented Mar 9, 2013

git diff accepts range syntax, but git diff A..B is just a synonym
for git diff A B, and the more obscure git diff A...B is a synonym
git diff $(git merge-base A B) B.

If a libgit2-based program wants to support those syntaxes, it should
be easy to look for .. and hand the endpoints to
git_revparse_single. I'm not sure what value a git_revparse_range
could return to make that job any simpler. Unless I'm missing
something deeper, there isn't really anything conceptually in common
between what A..B means as a range and what it means to git diff.

@carlosmn
Copy link
Member

carlosmn commented Mar 9, 2013

Diff does need two tree-ishes to compare, but you may not get them as two different arguments. If we're going to parse the same as git, git diff A..B is the same as to git diff A B, but the program would only get one argument. Would git_revparse_single be able to handle this somehow?

I don't quite get what callback you're referring to. Letting the revwalk machinery handle the figuring out of positive and negative commits for you would let you implement log, but you wouldn't have any idea what you're showing.

@carlosmn
Copy link
Member

carlosmn commented Mar 9, 2013

there isn't really anything conceptually in common
between what A..B means as a range and what it means to git diff.

We're parsing the same thing. There's a left side, which can be the preimage or the negative commit, the right side, which can be postimage or positive commit, and whether we're talking about the merge-base.

Log and diff do different things, but the format is the same. I don't see why it should be so deeply ingrained in the revwalk code when we need to parse the exact same syntax elsewhere. Should we have two pieces of code doing the same? Why is this preferable to letting the library user know what the user input and have it call the appropriate functions?

@gnprice
Copy link
Contributor Author

gnprice commented Mar 9, 2013

@carlosmn What API would you propose for this common parser? I still don't see what kind of output it would produce.

We're parsing the same thing. There's a left side, which can be the preimage or the negative commit, the right side, which can be postimage or positive commit, and whether we're talking about the merge-base.

See, this isn't even true. Compare, from the Git documentation:

The following two commands are equivalent:

-----------------------------------------------------------------------
    $ git rev-list A B --not $(git merge-base --all A B)
    $ git rev-list A...B
-----------------------------------------------------------------------

vs.

"git diff A\...B" is equivalent to "git diff $(git-merge-base A B) B"

So in the rev-list case, the left-hand side of a three-dot range is actually a positive commit, not negative; it's symmetric with the right-hand side. The merge-base is involved in both cases, but with a rather different effect. These syntaxes take the same form, but they really are quite different things.

@carlosmn
Copy link
Member

carlosmn commented Mar 9, 2013

@carlosmn What API would you propose for this common parser? I still
don't see what kind of output it would produce.

It would give you the three pieces of information. The left side, the
right side and whether three dots were used.

         We're parsing the same thing. There's a left side, which can
         be the preimage or the negative commit, the right side, which
         can be postimage or positive commit, and whether we're talking
         about the merge-base.
        

See, this isn't even true. Compare, from the Git documentation:

Right, when you're dealing with the merge base, the meanings change a
bit, so you adjust. This is semantics, which change depending on what your goal is.

This doesn't change that we're dealing with the same syntax. You can use the information however you want afterwards.

@gnprice
Copy link
Contributor Author

gnprice commented Mar 9, 2013

It would give you the three pieces of information. The left side, the right side and whether three dots were used.

So it would have three output arguments, or an output argument that is a struct that exists for this single purpose. I am not convinced this really saves much complexity for the caller.

Do you plan to submit an implementation of the git diff command-line syntax that would use this git_revparse_range? My sense of the discussion above is that others would disagree with that proposal. Certainly I don't think it's in the same category as the rev-list query interface as something that non-git.git tools are likely to want or to have difficulty implementing exactly for themselves if they do.

If you think you have a use case coming up that would take advantage of this parser, I can add it. If not, I think it makes sense to say YAGNI, and if someone eventually (a) adds this particular dark corner of git diff syntax and (b) thinks factoring the parser out is helpful, then we can see how it looks with the code in front of us.

@scunz
Copy link
Contributor

scunz commented Mar 9, 2013

As i am writing a graphical git client, i indeed see various places where i might take advantage of a range-of-commits parser in the way @carlosmn described it here.

Also i see no reason in duplicating such code.

Just to add a further usecase to the above: git also allows to cherry-pick a range.

@ben
Copy link
Member

ben commented Mar 9, 2013

Let me see if I'm thinking about this correctly:

  1. If you want git diff a..b behavior, you don't actually want a range, since you're really just diffing the a and b trees. We can either provide a git_revparse_range_endpoints api, or let the caller find the endpoints.
  2. If you want git rev-list a..b behavior, you want a revwalk with appropriate starting and ending points.
  3. If you want git rev-list a...b behavior, you want a revwalk with appropriate starting and ending points, and some extra context.
  4. If you want to cherry-pick a range of commits, you want to apply the commits in increasing-date order. A revwalk can help you do this.

What I'm getting out of this is that a revwalk is a really useful thing, and being able to set the boundaries of one from a rev-list spec is also a useful thing. The ability to specify the two sides of a diff using a syntax that's intended to generate a range of things seems quite a bit less useful.

Let's not get too bogged down with this. There doesn't seem to be much disagreement that git_revwalk_push_range is desirable. What we are disagreeing on is whether we should also allow use of a similar syntax in different contexts. Or am I totally off base?

Also, I'd still like to see the arg-parsing stuff moved into the sample code. Libgit2 isn't only (or even primarily) for command-line tools, and parsing command-line options is outside the scope.

@carlosmn
Copy link
Member

carlosmn commented Mar 9, 2013

If you think you have a use case coming up that would take advantage of this parser, I can add it. If not, I think it makes sense to say YAGNI,

You've already identified an use-case: letting the user pass ranges. Diff is the example of a different part of git that parses the same syntax.

There doesn't seem to be much disagreement that git_revwalk_push_range is desirable. What we are disagreeing on is whether we should also allow use of a similar syntax in different contexts. Or am I totally off base?

It's definitely something useful, but why implement it as a black box inside the revision walker? What does a program do when it wants to know what it's asking the library to do?

@ben
Copy link
Member

ben commented Mar 9, 2013

You've already identified an use-case: letting the user pass ranges. Diff is the example of a different part of git that parses the same syntax.

Here are the two use cases I see:

  1. Walking a range of commits.
  2. Grabbing the endpoints of a range of commits.

Are there any others that we should design for?

@scunz
Copy link
Contributor

scunz commented Mar 9, 2013

There are lots of places where the git cli expects commit ranges - and I'm not sure in which of those 2 categories they can be assigned. However, after this discussion, I think most of them can probably be reduced to feeding a revwalker.
I don't see a second use case for the diff-style giving of parameters.

@carlosmn
Copy link
Member

carlosmn commented Mar 9, 2013

Those are the main use-cases right now if we want to emulate git and those are good starting points. You'll typically end up setting up a revwalker based on that, but it doesn't mean returning a revwalk is a good idea, as you've no idea what's happening at that point or which of the commits the user asked you to show.

@ben
Copy link
Member

ben commented Mar 11, 2013

@gnprice, if you're still paying attention, I have a way to get this accepted. Mostly it involves me getting my way. 😄

Let's take the command-line parsing stuff and move it into the sample. The addition to the revwalk API is fantastic, though, I'm looking forward to having it in the API.

@scunz
Copy link
Contributor

scunz commented Mar 11, 2013

Let's take the command-line parsing stuff and move it into the sample. The addition to the revwalk API is fantastic, though, I'm looking forward to having it in the API.

Full Ack from me 👍

@gnprice
Copy link
Contributor Author

gnprice commented Mar 12, 2013

@gnprice, if you're still paying attention, I have a way to get this
accepted. Mostly it involves me getting my way. 😄

Let's take the command-line parsing stuff and move it into the
sample.

Yep, that's what I concluded from the discussion so far. I plan to do it. It may be a few more days, because there's a deadline at work.

Also because I want to experiment with how to have tests for the examples/ files, so that my tests here don't go to waste. (Do any of them have tests somewhere I'm missing?) Though now that I look a bit at it, maybe that deserves its own pull request. What repository is examples/general.c meant to run against as a demo? It fails for me with

*Raw Object Read*
Error -3 finding object in repository - Object not found - failed to find pack entry (fd6e612585290339ea8bf39c692a7ff6a29cb7c3)

and I can't find that object ID in either libgit2.git or any of the test repos in tests-clar/resources. Seems like it'd be good to have a test to make sure the example all still runs.

@ben
Copy link
Member

ben commented Mar 12, 2013

I like the way you think. 😄 You're not blind, the examples actually don't have any tests.

I'm actually not sure what repo the general example was originally meant to be run against. Its function right now is as a readable code snippet. Feel free to update it to work against one of the test repositories if you like.

gnprice added 2 commits March 31, 2013 15:33
Signed-off-by: Greg Price <[email protected]>
The purported command output was already inaccurate, as the refs
aren't where it shows.  In any event, the labels a reader of this
file really needs are the indices used in commit_sorting_*, to make
it possible to understand them by referring directly from those
arrays to the diagram rather than from the index arrays, to commit_ids,
to the diagram.  Add those.

Signed-off-by: Greg Price <[email protected]>
@gnprice
Copy link
Contributor Author

gnprice commented Mar 31, 2013

New series pushed. This moves the parsing of the query syntax out into examples/, as requested. It also pulls out a git_revparse_rangelike for parsing the range-like syntax used by git rev-list and git diff.

The last commit adds a test script for the rev-list example program. This is the first test for anything in examples/, so it's pretty direct, without much of a framework.

@vmg
Copy link
Member

vmg commented Mar 31, 2013

Yeah, this is looking very good. Somehow tests are not passing -- can you force a repush and see what happens?

My only concern is the .sh test for the example. As a rule of thumb, we don't ship tests that cannot run under all platforms.

@ben
Copy link
Member

ben commented Apr 1, 2013

I'm with @vmg on the platform stuff, but the examples are problematic at best right now. The general example won't even compile with MSVC; it declares variables in the "wrong" places.

I'm tempted to ignore that for now, and leave Windows compatibility for a future PR.

*
* @param left the left-hand commit
* @param right the right-hand commit
* @param threedots 0 if the endpoints are separated by two dots, 1 if by three
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make some indication that these three are output parameters? Right now it's not clear unless you check the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good catch. Done.

@ben
Copy link
Member

ben commented Apr 1, 2013

@gnprice, thanks for coming back to this. I like what you're doing here. 😃

gnprice added 4 commits April 6, 2013 20:51
All the hard work is already in revparse.

Signed-off-by: Greg Price <[email protected]>
This demonstrates parts of the interface for specifying revisions that
Git users are familiar with from 'git rev-list', 'git log', and other
Git commands.  A similar query interface is used in out-of-core
command-line programs that browse a Git repo (like 'tig'), and may be
useful for an 'advanced search' interface in GUI or web applications.

In this version, we parse all the query modifiers we can support with
the existing logic in revwalk: basic include/exclude commits, and the
ordering flags.  More logic will be required to support '--grep',
'--author', the pickaxe '-S', etc.

Signed-off-by: Greg Price <[email protected]>
This test file could probably be improved by a framework like
the one in git.git:t/, or by using a language like Python instead
of shell.

The other examples would benefit from tests too.  Probably best
to settle on a framework to write them in, then add more tests.

Signed-off-by: Greg Price <[email protected]>
@gnprice
Copy link
Contributor Author

gnprice commented Apr 7, 2013

Pushed a new version responding to @ben's comments, and the Travis build failed in the same spot -- I'll see if I can reproduce locally and debug.

I'm all for platform-independence as a criterion for a good long-term test framework for examples/. For now, I think the test is much better than no test, and I agree with Ben that it'd be better to merge as is and make the test better (and test the other examples!) in future work.

@gnprice
Copy link
Contributor Author

gnprice commented Apr 7, 2013

Yeah, I can't reproduce. Following exactly the sequence of commands in the failed Travis log (well, I added a --reference to optimize the clone command), all tests pass.

To my surprise, I'm even running the same OS as Travis -- 32-bit Ubuntu Precise. So I don't know what the difference is.

I see the Travis log includes installing valgrind. Is it possible to get more information from valgrind about the segfault inside Travis? Alternatively, @ben or @vmg, are you able to reproduce the failure locally?

@vmg vmg merged commit 2e23328 into libgit2:development Apr 7, 2013
@vmg
Copy link
Member

vmg commented Apr 7, 2013

Yeah, I can definitely reproduce. You were missing a couple NULLs in the test_rangelike helper. I fixed the issue and merged manually.

Thank you again for the PR! Looking forward to more stuff for you. :)

@vmg
Copy link
Member

vmg commented Apr 7, 2013

Fix is 812e5ae btw.

@gnprice
Copy link
Contributor Author

gnprice commented Apr 7, 2013

I see, the issue is that git_object_free() got called on uninitialized values in the failure cases. Thanks for the fix!

@gnprice gnprice deleted the revwalk branch April 7, 2013 07:30
@ben ben mentioned this pull request Apr 9, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants