-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Remove git_buf
as a public-facing structure
#5534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`git__getenv` belongs in a class instead of as a top-level function, move it into the `git_buf` class as `git_buf_getenv`.
Introduce a new user-facing buffer struct that is compatible with `git_buf`. This will allow us to keep our `git_buf` implementation private, to disentangle the notion of public and private types. But since it's compatible, it's trivially castable.
The `git_buf` type is now no longer a publicly available structure, and the `git_buf` family of functions are no longer exported. The deprecation layer adds a typedef for `git_buf` (as `git_userbuf`) and macros that define `git_buf` functions as `git_userbuf` functions. This provides API (but not ABI) compatibility with libgit2 1.0's buffer functionality. Within libgit2 itself, we take care to avoid including those deprecated typedefs and macros, since we want to continue using the `git_buf` type and functions unmodified. Therefore, a `GIT_DEPRECATE_BUF` guard now wraps the buffer deprecation layer. libgit2 will define that.
9e200a3
to
bab51e2
Compare
@ethomson I honestly don't quite get the concern. Isn't the same true for the new |
That's not the thing that I'm trying to solve for -- instead, every C program that's slightly more than trivial needs some of the same pieces of functionality: string handling, array handling, etc. For us, that's But you can't simply copy That's because You could, I suppose, As I mentioned, this is very much top of mind to the CLI. Because no matter how trivial this CLI is, it probably needs the moral equivalent of This was the "needless complexity" that I mentioned in the PR description. I don't address this here but I think that this is a good first step towards that. |
To give you an idea of what I would like to do in the long-term, I would like to change any code that takes a Then I would like to change
(moving the
Now we can cast our |
Fair enough. I'm not sure this really is worth adding this complexity, though. I'm on your page that having the CLI is a cool thing, but changing our own API to make implementing it easier seems kind of backward to me. While true that we should learn from our mistakes that we unearth by dogfeeding our own interfaces, but I'd still like to remain super cautious when deprecating interfaces.
Yup, that's definitely true and one thing I've thought about, too. The only concern I have is backwards-compatibility. One thing we should keep in mind is that the CLI will most likely be used by others as a reference to implement their own applications that make use of similar structures. And if we starting using internals of libgit2 (even if exposed via a separate, internal-only library), people might want to use the same non-external helpers, too. Which is why I really think there should be a hard separation between CLI and libgit2: the CLI will only ever use what's exposed by libgit2's official API and nothing more. It helps others in that they can re-use code, it helps us as we start dogfeeding our own code and it doesn't introduce new internal libraries. |
Yes! I think this is a good thing. :) I'm not at all trolling. If I'm coding a C application, I want to reach for This is akin to GNOME/glib coming from gimp. I suspect that any sufficiently large C application will create its own utility class and eventually export it. I'm not really proposing that we create a general purpose utility library and encourage a bunch of other people to use it. But I am proposing that we shard out our general purpose utility library so that they could.
OK - serious question then - how do we do string manipulation in the CLI? I think that we can either:
|
Haha, well, that comes unexpected.
It's definitely a good question and I think there's no one right answer.
Agreed.
So ideally, the planned higher-level interfaces should already allow us to not do much string manipulation anyway. I guess I'm being naive though and that you're right, but I would've thought that most string handling would be to just print it to stdout/stderr. Using the printf family would be perefctly fine for this purpose.
So with your framing of "Maybe we do want to expose these helpers" this doesn't sound too bad to me. As you say, we have them anyway and they've proven to be quite stable, so I could also see us directly exposing them via the normal libgit2 library. So instead of going the way of splitting up To me, this does feel like a small break with our existing "mission". I mostly took libgit2 as the core library that most nobody uses directly anyway because everybody uses bindings instead, and adding such low-level helpers to our interface doesn't help bindings at all. I'd be careful with what we expose, but doing this for |
This is definitely not what I was proposing. 😄 I don't think that we should be exposing these publicly as part of libgit2. I don't want If I'm building a new tool in C and want to use these handy utility classes, I want to just pull them in to my tool and use them directly. I do not want to have to link to libgit2 to use them. If I'm building something that uses libgit2 that might be useful, but if I'm building a totally random tool, then linking to libgit2 to get string manipulation is a non-starter for me. I have in the past taken parts of (In a perfect world, we might have a separate prefix for the utility code, Is this really in our mission? No, I suppose not, but
There's obviously a lot of precedent here, thinking of how glib came from GIMP and libchrome came from chrome. I wouldn't want to take this out of our tree or commit to an API but this is where I'm coming from, to give you some more insight into my thought process. Now, having said all that, what about this issue? Even beyond all that ☝️, I think that it's useful to disentangle our I guess I don't understand yet what you don't like about this change? Is it the overall thinking (not giving users internal types) or is it the implementation? There are obviously many ways to go about the implementation, but I think we need to be aligned on the goals first. |
* @param target_size The desired available size | ||
* @return 0 on success, -1 on allocation failure | ||
*/ | ||
GIT_EXTERN(int) git_userbuf_grow(git_userbuf *buffer, size_t target_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we're going to introduce this type, to me it should really be limited to providing output to the user. As you said, separation of concerns makes sense, and we're forced to handle character arrays to the user in wrapper as opposed to handing memory ownership over to the caller directly. Otherwise there might be mismatches in malloc/free implementations etc.
So the scope of git_userbuf
should be getting information to the user so that he may access it and then properly dispose of it by calling git_userbuf_dispose
. To me, git_userbuf_grow
and git_userbuf_set
don't fit into this scope and should not be provided.
What's missing though are git_userbuf_len
and git_userbuf_ptr
functions to access the structure, mostly because I'd prefer the structure to be opaque to the user. I don't think we'll ever be able to actually make it opaque without a major backwards compatibility breakage (which I definitely don't want to pursue), but at least allowing users to treat the structure as opaque would make sense to me. We should probably also provide git_userbuf_init
, as the macro won't work in all situations.
It's really hard to nail down. I think what I don't like about it is that to me, it feels like code duplication with the intent to help others stop duplicating code if they're copying the code from us. I'm sure we're kind of talking past each other and that I'm just misunderstanding, but this motivation feels... weird to me. And it's definitely not all of the motivation you spell out anyway. Anyway, I won't block this change. It's not like I'm a 100% against it, I just still don't quite get the point. How about we just get a third opinion on this? Maybe that'd help us understand each others motivations better, either if that third party has the same concerns as I do or if that person is able to bring across the point in a way that even I am able to understand it :P I've also commented on the interface to make it easier to find some common ground and agree on the scope of |
I get the appeal of using the utility functions as we have done this at work. I don't know how much of an effort the library should make to make that possible. I do agree with the concern of I don't find it that bad that we accept a We should be careful to balance the ease of extraction with the inconvenience and extra work for everyone else utterly uninterested in any of this, particularly now that we're post-1.0 and we've promised stability. Deprecating some o the other function is still fine by me unless we can figure out why on earth we expose so much random stuff for the buffer. As far as creating this new type... I wonder if it's possible to achieve this extractability i.e. not mixing libgit2's and the extracted functions, differently. IIUC this is just an issue when we want to copy-paste It feels to me like the onus can be more on the extractor's side. After all they're already doing some work to copy and adjust their build system in order to save themselves the work of building an equivalent buffer/string handling library/utility functions. In some cases I suppose it might involve pointing at a libgit2 source dir they're using already. Maybe we can take a page out of klib's solution to potentially being included multiple times. As terrible as it is on many levels, not least reading and writing the code... maybe we should make the prefix configurable, so you define So maybe this is a sensible compromise to the work involved from the different parties (while leaving it open for us to restrict how much |
Yeah, I think that you've identified the two problems I have with
At this point I'm more concerned about the first point than the second, to be honest. This has rotted quite a bit, but I'll turn the crank here on this PR and iterate to separate the concerns. I'll make sure that we're focusing on that first point instead of the second. |
We often want to provide callers with a buffer that they can control the lifetime of (for example, several configuration functions will take a
git_buf
that they write the value into). This is a very useful pattern butgit_buf
is also a utility type that we use internally.It's useful to separate these concerns, especially if we were to make a more general-purpose set of utility classes/functions. By intermingling
git_buf
as a public type and as a general utility, it becomes difficult to manage the memory. Ifgit_buf_dispose
is an exported type then it must live in the libgit2 library. Meaning that other assemblies can't allocate memory in agit_buf
and then usegit_buf_dispose
, since on some systems (eg, Windows) allocators are per-assembly.Splitting
git_buf
into a true utility class with no exported symbols allows us to keep it internal.This adds a
git_userbuf
that is used for returning data to users. For API compatibility, we provide atypedef
and macros that are enabled unlessGIT_DEPRECATE_HARD
is set.git_userbuf
is the same definition asgit_buf
so that we can simply cast it and work with it using thegit_buf
functions internally.This also adds a
GIT_DEPRECATE_BUF
option that will remove the deprecatedgit_buf
definitions. This is useful for the library itself, which would not want the buffer deprecation layer, since it uses actualgit_buf
functions.There are a few issues here that I did not address:
There are a few functions that don't give users data in the form of a
git_buf
, but instead take data from them. We should evaluate these carefully, but it's very unlikely to me that this is "correct". We should take data from users in a NUL-terminatedconst char *
or a buffer and length pair and then make a copy.The filter functionality is wacky. I'd forgotten about this "we'll give you a
git_buf
and you give us agit_buf
and it could be the same for efficiency's sake". That should have been deprecated the moment that we had filter streams which are themselves complex, but not this cannon aimed at your foot.If we can safely deprecate these (and I think that we can) then we can get
git_userbuf
to a place where it's strictly written by libgit2 and consumed and then freed by users, which would reduce some of the needless complexity. But that's a different pull request.