Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RFC: add libgit2client and a command-line interface #5507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 67 commits into from
Closed

Conversation

ethomson
Copy link
Member

This pull request expands the scope of libgit2 from just being the (relatively) low-level library for dealing with repositories to add two new pieces of functionality:

  • libgit2client: a "middleware" library that is focused on providing the functionality that client applications would find useful: for example, a subtransport that uses the ssh command-line, and a filter implementation that runs the commands specified in the attributes.

    Things like LFS and executing ssh have been well outside the scope of libgit2 since its implementation, however we've been forcing every tool that wants to build a client on top of us to deal with these problems themselves. We should provide this functionality for them.

    I propose putting this in a separate library since not everybody finds it valuable and - frankly - if I were running a git hosting provider, I would prefer not to have fork and exec in the codebase at all if I could avoid it.

  • git2_cli: a command-line interface that emulates git itself for testing. This allows us to test our client library, to eat our own dogfood, and - potentially - to start trying to use git's command-line tests against our own CLI to validate compatibility.

  • util: the shared bits of code (buffer manipulation, etc) that all the layers will want to use. This is not provided to end-users, just used by the two shared libraries and the CLI application.

@ethomson ethomson force-pushed the ethomson/cli branch 7 times, most recently from 5677c44 to fb41349 Compare May 11, 2020 13:05
@pks-t
Copy link
Member

pks-t commented May 11, 2020

You finally got around to implementing your dreams 🎉 Quite a lot of changes, so I won't get to it today. I'll first prioritize some of the other PRs that we currently got lingering, but hope to get to it in thenot-too-distant-future.

@ethomson
Copy link
Member Author

You finally got around to implementing your dreams 🎉 Quite a lot of changes, so I won't get to it today. I'll first prioritize some of the other PRs that we currently got lingering, but hope to get to it in thenot-too-distant-future.

Indeed, this is a lot of changes. I think that it might make sense to get some high-level discussion out of the way before actually reviewing the code. Here's what I'd propose as a reviewing strategy:

  1. Let's decide on whether this is a useful concept to add to the project. Do we want to take the project in this direction? I think that the answer is yes but it's a valuable place to start the discussion.

    It's obviously not impossible to do this in a new project, outside of libgit2. However, I think that it's both technically easier, easier for the developers, and more valuable to the users of the project.

    As far as the technical side, I did evaluate some other options. For example, having a new project that was the client and/or CLI pieces that simply used libgit2 instead of it being inside this project. But realistically any C-based CLI will have to do a lot of reimplementation of boilerplate utility code (vectors, string buffers) that is tedious, pointless and errorprone and it's just much easier to use the stuff that we've already written.

    I then thought about using libgit2 as a submodule of another project, and then pulling in just the utility code, but realistically, I think that requires a firmer split between utility code and business logic and a little untangling to be usable. This is a viable strategy, but it requires libgit2 being strict about that separation, and

    So I decided that actually having this as part of libgit2 makes the most sense.

  2. Does this separation of concerns make sense? In other words, does it make sense to have three separate components:

    • libgit2: the low-level git repository handling
    • libgit2client: the generic pieces for client libraries to re-use
    • git2_cli: an implementation using libgit2client

    And a shared utility library (implemented as a cmake object library) for those three pieces. Obviously we could put the libgit2client pieces in libgit2 itself which is easier in many regards, but I actually like having it separated out so that the people who don't need it won't have it.

  3. Based on the decisions in #2, let's decide on whether the split in the source and tests directories makes sense. Again, I think that it does, but there are arguments to be made about how that split occurs and where.

  4. After that, there are a handful of refactorings that drive us to a place where we can support this new project structure. I think many of them are useful changes in their own right, and could be reviewed completely independently of this direction. But I think that point 3 really informs these changes.

  5. Finally, the actually net-new pieces of code.

I think that for actually reviewing the code itself that I should cleave off pieces in reviewable chunks. I'll keep this PR open with the mega branch (mostly for visibility, so that anybody can play with the actual bits itself) but I don't think that we should actually be reviewing this PR as a whole for getting merged.

I'll cleave off the first piece soon-ish? There's some refactoring yet to do to get Windows working.

@ethomson ethomson force-pushed the ethomson/cli branch 5 times, most recently from 0ea2e10 to f10691b Compare May 19, 2020 00:50
@libszen
Copy link

libszen commented May 26, 2020

A reference client library and CLI would indeed be invaluable! Thank you very much for championing this P.R.

A request: can the CLI be licensed with a more permissive license (public domain or Apache V2, ...), while retaining the current license for the client library? It will allow copy-pasting of CLI code without licensing issues by interested parties. This would also be consistent with existing examples being released in the public domain.

@ethomson
Copy link
Member Author

A request: can the CLI be licensed with a more permissive license (public domain or Apache V2, ...), while retaining the current license for the client library?

Huh. I hadn't considered this. It may be that the libgit2 license is not necessarily appropriate for a CLI, but IANAL and I'd want to talk to one before I made any changes here. But I'd like to better understand your motivations around copy/pasting code.

I'd like to have the business logic of dealing with client things in the libgit2client library. The CLI, IMO, would just be a way to parse the command line, and invoke pieces of libgit2 or libgit2client itself. How could we improve these two pieces to allow you to call into them instead of copy/pasting code out of the CLI project?

@ethomson ethomson force-pushed the ethomson/cli branch 5 times, most recently from 30cf040 to de774ce Compare May 29, 2020 09:52
@ethomson ethomson force-pushed the ethomson/cli branch 3 times, most recently from 9815952 to c7c1ab6 Compare May 30, 2020 14:07
@pks-t
Copy link
Member

pks-t commented Jun 1, 2020

This pull request expands the scope of libgit2 from just being the (relatively) low-level library for dealing with repositories to add two new pieces of functionality:

  • libgit2client: a "middleware" library that is focused on providing the functionality that client applications would find useful: for example, a subtransport that uses the ssh command-line, and a filter implementation that runs the commands specified in the attributes.
    Things like LFS and executing ssh have been well outside the scope of libgit2 since its implementation, however we've been forcing every tool that wants to build a client on top of us to deal with these problems themselves. We should provide this functionality for them.
    I propose putting this in a separate library since not everybody finds it valuable and - frankly - if I were running a git hosting provider, I would prefer not to have fork and exec in the codebase at all if I could avoid it.

I think having a higher-level API would be worthwhile to have and is one thing some people had been asking for in the past. I'm not sure I'm a huge fan of having it as a separate library, though, but personally I'd vote for having a build option for this instead.

The most important benefit would be that we can avoid having a separate "util" library and thus don't require things like the git_userbuf split. Also, by having a "util" library, I fear that some implementation details may leak from the libgit2 library into the CLI code. Having them as strictly separated as possible would be a huge win to me personally, as we start dogfeeding our own API and thus see the good, bad and ugly parts of it with more clarity.

One question I have about the middleware is how its interface should look like. I see that right now, it simply accepts an argv array and does the parsing internally. I wonder whether we want to split concerns here and have each command accept an options sturcture, where the struct's members strictly resemble the API options. The parsing logic would then reside in git2_cli instead of inside the middleware. The benefit would be that potential users of the middleware wouldn't need to guess (or read documentation, which is usually out of date anyway) as to which options are supported and which aren't, but can just take a look at the options struct itself.

  • git2_cli: a command-line interface that emulates git itself for testing. This allows us to test our client library, to eat our own dogfood, and - potentially - to start trying to use git's command-line tests against our own CLI to validate compatibility.

Yeah, this is something we've repeatedly discussed in the past and something I'm happy to have. I'd prefer to avoid the underscore and call it git2cli, but that's mostly cosmetics. I wonder whether we want to drop most of the examples in that case, as they mostly do the same thing and implement a subset of supported options of Git. Instead, we'd just keep "real" examples that don't have any use in and of themselves.

  • util: the shared bits of code (buffer manipulation, etc) that all the layers will want to use. This is not provided to end-users, just used by the two shared libraries and the CLI application.

As said above I'm not much of a huge fan of having a shared internal libary, mostly due to my desire to keep a strict boundary between libgit2 the library and libgit2 the command line interface, and sharing some internal utilities blurs the lines. Most processing that's done in the CLI should be trivial command line parsing and execution of commands as the heavy-lifting is performed by the middleware anyway. And as I'm of the opinion that the middleware should live inside of the libgit2 library, there wouldn't be any need for the util library.

@ethomson
Copy link
Member Author

ethomson commented Jun 1, 2020

I think having a higher-level API would be worthwhile to have and is one thing some people had been asking for in the past. I'm not sure I'm a huge fan of having it as a separate library, though, but personally I'd vote for having a build option for this instead.

I had not considered this route... it feels weird to me that you could have two very different versions of the library, one that has a bunch of functions that are more "client" facing functions, and ones that do not. Would these two different varieties of the library have different sonames? If not, that doesn't feel right, since you'd expect ABI compatibility. Or would we have stubs that just throw? That's sort of meh as well.

Also, by having a "util" library, I fear that some implementation details may leak from the libgit2 library into the CLI code. Having them as strictly separated as possible would be a huge win to me personally, as we start dogfeeding our own API and thus see the good, bad and ugly parts of it with more clarity.

I agree. The CLI should not be like libgit2_clar, where it actually has some insights into private functions. However, we would never want to reimplement git_buf or git_vector in two different places. Once is, surely, enough. And I think that even the most minimal CLI would want simple string and array handling.

But, putting that aside for a moment, it sounds like you're describing something a bit different than the direction that I've pursued: a high level API call that matches a git porcelain command. eg, git_client_checkout(). (I pick this example purposefully.)

But is this really useful? git checkout is a mess. It switches branches, creates branches, puts files on disk from the HEAD, or the index, or from a different branch altogether. I think that there's only one client that would want an API that emulated this monstrosity, and it's this CLI.

I concede, though, that this PR doesn't give people a lot to help in building a client off of. And, of course, I could be wrong. And the data structure that we parse command-line options into could just be the input to an API. And if it's useful for people, great, and if it's not, that's fine, too. There's no obligation for them to use it, but it does give people who want a simple mechanism to "run the CLI" that option.

It might be useful to know what the folks at TortoiseGit (@csware), GitKraken (@implausible) would want in a client API.

ethomson added 2 commits June 5, 2020 07:11
`git__getenv` belongs in a class instead of as a top-level function,
move it into the `git_buf` class as `git_buf_getenv`.
Introduce a new user-facing buffer struct that is compatible with
`git_buf`.  This will allow us to keep our `git_buf` implementation
private, to disentangle the notion of public and private types.  But
since it's compatible, it's trivially castable.
ethomson added 20 commits June 5, 2020 07:14
 Introduce a command-line interface for libgit2.  The goal is to be
 git-compatible, so that:

 1. By creating a git client ourselves, we can understand the needs of
    git clients and produce a common "middleware" for commonly-used
    pieces of client functionality.  For example: interacting with
    other command-line tools, like filter drivers or merge drivers.
    This can assist other git clients.

 2. We can benefit from git's unit tests, running their test suite
    against our own CLI to ensure correct behavior.

 3. We can easily benchmark ourselves against git to understand where we
    are poorly performing, by running identical commands between git and
    ourselves.

 4. We can easily A/B test ourselves against git, at least for read-only
    operations, which will ensure that we are producing identical output.

 This commit introduces a simple infrastructure for the CLI.
As we consume parts of the libgit2 utility functions (like `git_buf`),
we will inevitably need to allocate.  Since we re-use the libgit2
allocation functions - but linked into our application - we'll need
to configure our allocation strategy ahead of time.
Our options parsing system can also be used as the basis for displaying
command-line usage.  Add usage information, using knowledge of the
console (if we're attached to one) for wrapping nicely.
Set up a framework for subcommands, and introduce the first, "help".

Help will display the commands available, and information about the
help command itself.

Commands are expected to provide their own usage and help information,
which the help command will proxy to when necessary.
Provide a class that will display progress information to the console.
Initially, it contains callbacks for fetch progress and checkout
progress.
Provide a mechanism to add a signal handler for Unix or Win32.
Provide a helper function to copy a number of strings from the source to
the target.
Provide functions to search through string arrays.
SSH paths come in a variety of formats, either URLs (ssh://user@host/path)
or SCP style (user@host:path).  Provide a mechanism to parse them.
Add a function to add a search/replaced string to a git_buf.
Introduce a helper method to quote a string in a shellsafe manner.  This
wraps the entire buffer in single quotes, escaping single-quotes and
exclamation points.
Provide a mechanism for the client library to report a unique
error class.  This will not be used by libgit2 directly.
@ethomson
Copy link
Member Author

ethomson commented Dec 6, 2021

Closed in favor of #6133

@ethomson ethomson closed this Dec 6, 2021
@ethomson ethomson deleted the ethomson/cli branch April 3, 2024 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants