-
Notifications
You must be signed in to change notification settings - Fork 2.5k
In-memory configuration #4767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In-memory configuration #4767
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice ! I noticed a few minor stylistic things, but overall it looks pretty solid, so 👍.
src/config_entries.c
Outdated
typedef struct config_entries_iterator { | ||
git_config_iterator parent; | ||
config_entry_list *head; | ||
} config_entries_iterator; | ||
|
||
struct git_config_entries{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: missing space before {
static int config_memory_lock(git_config_backend *backend) | ||
{ | ||
GIT_UNUSED(backend); | ||
return config_error_readonly(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a fuzzy recollection that lock
/unlock
can be used by transactions, and that snapshot
can also be about having a consistent view of the configuration at some point in time (even when read-only).
I just want to point that out, though AFAICT this is mostly intended for testing purposes 😉. Support for those can be added later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep this PR as short as possible while still providing basic functionality. I plan on implementing write support and snapshotting as soon as this is merged, especially so as both things should be rather trivial to implement. I didn't yet take a look at lock/unlock and transactions, but I'll do that at the same point in time, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to error out on locking, same as any other read-only configuration. It would be good to support snapshotting as it should be pretty easy and it'd allow for readers to be very sure that they can borrow a string. We can deal with that later, but ideally we'd see that before the next release.
static int config_error_readonly(void) | ||
{ | ||
giterr_set(GITERR_CONFIG, "this backend is read-only"); | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still longing for a GIT_ENOTSUP
…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, interesting. I'm game for an ENOTSUPPORTED
or whatever, but I worry that it would get really weird really fast, at least here for config. Since configuration is so low-level, I worry that it would leak out very strangely. For example, if I called git_merge
and got an GIT_ENOTSUPPORTED
back then I would very much not understand what it was that wasn't supported.
I think? Is this actually the case that you had in mind that you do in fact want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it could be mostly given as higher-level layers that the current thing they're trying to do can't work, that would allow that higher-level to preserve the semantics of what happened.
For example, we don't do octopus merges, that can be used to signal that to the user (instead of the generic error we're currently returning).
In the config case, the higher level could try other backends and either swallow the error if another one was able to handle the request or error (as currently done), or return ENOTSUPPORTED
to be more specific about the failure (in case there's only one in-memory backend loaded, which the user is ultimately responsible for). Which makes it pretty close to the ENOTFOUND
semantics some codepaths have.
This is pretty minor though, it's just that there are some places where it would maybe be useful to report to the user that they're facing a "limitation" of the API, not a hard error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that ENOTSUPPORTED
would be a nice to have in some locations, including this one. The PR is already big enough as is right now, if you ask me, so I'd love to postpone this to another PR. Especially so as it might require reviewing of callers, as noted by @ethomson
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just wishful thinking 😉. Since it seems we find that a worthwhile error code, I'll try to create a PR for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm skeptical of run-time feature detection like this. Do we have something that isn't detectable at compile time and isn't just a missing feature or a bug?
return error; | ||
|
||
if ((error = snapshot->open(snapshot, bh->level, bh->repo)) < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Thanks for your feedback. I've fixed the stylistic issue spotted by @tiennou |
590c4cf
to
839b7d8
Compare
Rebased to fix conflicts with #4799 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to split the refactoring into the different commits with good commit messages. It made looking through the changes leading up to the new code easy.
I've left a couple of comments but I really like how this ended up. Iteration via snapshot leading to a reparse sounds like an unintended side-effect but the idea there is to duplicate the values into a version we can own and that won't change so that works out to the same.
src/config.c
Outdated
|
||
GIT_UNUSED(new_raw); | ||
|
||
giterr_set(GITERR_CONFIG, "a file with the same level (%i) has already been added to the config", (int)(*old)->level); | ||
giterr_set(GITERR_CONFIG, "a backend with the same level (%i) has already been added to the config", (int)(*old)->level); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if using "backend" here might be a bit too much in the weeds for a message that a user is bound to see. I don't have a better word right now, but it might be worth giving it a think or two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about just using "configuration" instead of backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that sounds weird when reading the complete sentence. "a configuration with the same level (%i) already exists"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have "configuration at this level (%i) is already set" or "there is already configuration at this level (%i)" which parse better.
src/config_mem.c
Outdated
GIT_UNUSED(line_len); | ||
|
||
if (current_section) { | ||
/* TODO: Once warnings lang, we should likely warn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: s/lang/land/
static int config_memory_lock(git_config_backend *backend) | ||
{ | ||
GIT_UNUSED(backend); | ||
return config_error_readonly(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to error out on locking, same as any other read-only configuration. It would be good to support snapshotting as it should be pretty easy and it'd allow for readers to be very sure that they can borrow a string. We can deal with that later, but ideally we'd see that before the next release.
src/config_mem.c
Outdated
{ | ||
GIT_UNUSED(out); | ||
GIT_UNUSED(backend); | ||
return config_error_readonly(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erroring out because it's read-only is the wrong thing to do semantically. This should be an error about this particular backend not supporting snapshots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I've been too eager to just reuse that nifty function.
Also referring to the previous comment of yours, I intend to create a follow-up PR as soon as this is created. Snapshotting is trivial to implement, as you said, but I first wanted to get going the refactoring as well as basic functionality. I definitely intend to implement snapshotting and potentially even write support for the next release
When populating the list of submodule names, we use the submodule configuration entry's name as the key in the map of submodule names. This creates a hidden dependency on the liveliness of the configuration that was used to parse the submodule, which is fragile and unexpected. Fix the issue by duplicating the string before writing it into the submodule name map.
The variables `git_config_escaped` and `git_config_escapes` are both defined as static const character pointers in "config_parse.h". In case where "config_parse.h" is included but those two variables are not being used, the compiler will thus complain about defined but unused variables. Fix this by declaring them as external and moving the actual initialization to the C file. Note that it is not possible to simply make this a #define, as we are indexing into those arrays.
Originally, the `git_config` struct is a collection of all the parsed configuration files from different scopes (system-wide config, user-specific config as well as the repo-specific config files). Historically, we didn't and don't yet have any other configuration backends than the one for files, which is why the field holding the config backends is called `files`. But in fact, nothing dictates that the vector of backends actually holds file backends only, as they are generic and custom backends can be implemented by users. Rename the member to be called `backends` to clarify that there is nothing specific to files here.
Same as with the previous commit, the `file_internal` struct is used to keep track of all the backends that are added to a `git_config` struct. Rename it to `backend_internal` and rename its `file` member to `backend` to make the implementation more backend-agnostic.
Fixed comments noted by @carlosmn. Range-diff:
|
1d8030a
to
c265062
Compare
As a last step to make variables and structures more backend agnostic for our `git_config` structure, rename local variables to not be called `file` anymore.
The function `git_config_file_normalize_section` is never being used in any file different than "config.c", but it is implemented in "config_file.c". Move it over and make the symbol static.
The header "config_file.h" has a list of inline-functions to access the contents of a config backend without directly messing with the struct's function pointers. While all these functions are called "git_config_file_*", they are in fact completely backend-agnostic and don't care whether it is a file or not. Rename all the function to instead be backend-agnostic versions called "git_config_backend_*" and rename the header to match.
The implementation for config file snapshots has an unnecessary redirection from `config_snapshot` to `git_config_file__snapshot`. Inline the call to `git_config_file__snapshot` and remove it.
The configuration entry store that is used for configuration files needs to keep track of all entries in two different structures: - a singly linked list is being used to be able to iterate through configuration files in the order they have been found - a string map is being used to efficiently look up configuration entries by their key This store is thus something that may be used by other, future backends as well to abstract away implementation details and iteration over the entries. Pull out the necessary functions from "config_file.c" and moves them into their own "config_entries.c" module. For now, this is simply moving over code without any renames and/or refactorings to help reviewing.
The previous commit simply moved all code that is required to handle config entries to a new module without yet adjusting any of the function and structure names to help readability. We now rename things accordingly to have a common "git_config_entries" entries instead of the old "diskfile_entries" one.
The code accessing config entries in the `git_config_entries` structure is still much too intimate with implementation details, directly accessing the maps and handling indices. Provide two new functions to get config entries from the internal map structure to decouple the interfaces and use them in the config file code. The function `git_config_entries_get` will simply look up the entry by name and, in the case of a multi-value, return the last occurrence of that entry. The second function, `git_config_entries_get_unique`, will only return an entry if it is unique and not included via another configuration file. This one is required to properly implement write operations for single entries, as we refuse to write to or delete a single entry if it is not clear which one was meant.
The nice thing about our `git_config_iterator` interfaces is that nobody needs to know anything about the implementation details. All that is required is to obtain the iterator via any backend and then use it by executing generic functions. We can thus completely internalize all the implementation details of how to iterate over entries into the config entries store and simply create such an iterator in our config file backend when we want to iterate its entries. This further decouples the config file backend from the config entries store.
Instead of directly calling `git_atomic_inc` in users of the config entries store, provide a `git_config_entries_incref` function to further decouple the interfaces. Convert the refcount to a `git_refcount` structure while at it.
Access to the config entries is now completely done via the modules function interface and no caller messes with the struct's internals. We can thus completely move the structure declarations into the implementation file so that nobody even has a chance to mess with the members.
Right now, the config file code requires us to pass in its backend to the config entry iterator. This is required with the current code, as the config file backend will first create a read-only snapshot which is then passed to the iterator just for that purpose. So after the iterator is getting free'd, the code needs to make sure that the snapshot gets free'd, as well. By now, though, we can easily refactor the code to be more efficient and remove the reverse dependency from iterator to backend. Instead of creating a read-only snapshot (which also requires us to re-parse the complete configuration file), we can simply duplicate the config entries and pass those to the iterator. Like that, the iterator only needs to make sure to free the duplicated config entries, which is trivial to do and clears up memory ownership by a lot.
Now that we have abstracted away how to store and retrieve config entries, it became trivial to implement a new in-memory backend by making use of this. And thus we do so. This commit implements a new read-only in-memory backend that can parse a chunk of memory into a `git_config_backend` structure.
c265062
to
2be39ce
Compare
Amended once again to improve the error message. Seeing that the criticism only revolved around such minor things, can I assume that this is ready to be merged? |
So finally I hold true to my promise to refactor our config code (once again) and create an in-memory configuration backend. I think the result is quite pleasant due to the refactorings, and in fact the in-memory backend implementation is only a bit more than a hundred lines of code (excluding the stubs).
This series has three phases:
There are a few more things I'd like to do after this PR is merged. First, I think that we can get rid of the file read-only snapshot implementation and simply use an in-memory backend instead. Second, I'll implement writing to and snapshotting of the in-memory config backend. Both things should be easy to implement, but the series is already long enough and thus I refrained from doing so now.
As always, I tried to make sure that the series "develops" to make a pleasant read. So I highly recommend to read it commit-wise instead of all at once.