Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

pks-t
Copy link
Member

@pks-t pks-t commented Aug 16, 2018

So finally I hold true to my promise to refactor our config code (once again) and create an in-memory configuration backend. I think the result is quite pleasant due to the refactorings, and in fact the in-memory backend implementation is only a bit more than a hundred lines of code (excluding the stubs).

This series has three phases:

  1. Refactorings to make our config file more generic with regards to names. Most importantly, many things are renamed from "git_config_file_" to "git_config_backend_" to express that they are actually backend-agnostic.
  2. I pull out the "config_entries" store from the "git_config_file" backend that abstracts iteration order and memory management for config entries.
  3. I create the "git_config_mem" backend based on that store, which is rather trivial to do as everything is in place already.

There are a few more things I'd like to do after this PR is merged. First, I think that we can get rid of the file read-only snapshot implementation and simply use an in-memory backend instead. Second, I'll implement writing to and snapshotting of the in-memory config backend. Both things should be easy to implement, but the series is already long enough and thus I refrained from doing so now.

As always, I tried to make sure that the series "develops" to make a pleasant read. So I highly recommend to read it commit-wise instead of all at once.

Copy link
Contributor

@tiennou tiennou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice ! I noticed a few minor stylistic things, but overall it looks pretty solid, so 👍.

typedef struct config_entries_iterator {
git_config_iterator parent;
config_entry_list *head;
} config_entries_iterator;

struct git_config_entries{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: missing space before {

static int config_memory_lock(git_config_backend *backend)
{
GIT_UNUSED(backend);
return config_error_readonly();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a fuzzy recollection that lock/unlock can be used by transactions, and that snapshot can also be about having a consistent view of the configuration at some point in time (even when read-only).

I just want to point that out, though AFAICT this is mostly intended for testing purposes 😉. Support for those can be added later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep this PR as short as possible while still providing basic functionality. I plan on implementing write support and snapshotting as soon as this is merged, especially so as both things should be rather trivial to implement. I didn't yet take a look at lock/unlock and transactions, but I'll do that at the same point in time, I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to error out on locking, same as any other read-only configuration. It would be good to support snapshotting as it should be pretty easy and it'd allow for readers to be very sure that they can borrow a string. We can deal with that later, but ideally we'd see that before the next release.

static int config_error_readonly(void)
{
giterr_set(GITERR_CONFIG, "this backend is read-only");
return -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still longing for a GIT_ENOTSUP

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting. I'm game for an ENOTSUPPORTED or whatever, but I worry that it would get really weird really fast, at least here for config. Since configuration is so low-level, I worry that it would leak out very strangely. For example, if I called git_merge and got an GIT_ENOTSUPPORTED back then I would very much not understand what it was that wasn't supported.

I think? Is this actually the case that you had in mind that you do in fact want?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it could be mostly given as higher-level layers that the current thing they're trying to do can't work, that would allow that higher-level to preserve the semantics of what happened.

For example, we don't do octopus merges, that can be used to signal that to the user (instead of the generic error we're currently returning).
In the config case, the higher level could try other backends and either swallow the error if another one was able to handle the request or error (as currently done), or return ENOTSUPPORTED to be more specific about the failure (in case there's only one in-memory backend loaded, which the user is ultimately responsible for). Which makes it pretty close to the ENOTFOUND semantics some codepaths have.

This is pretty minor though, it's just that there are some places where it would maybe be useful to report to the user that they're facing a "limitation" of the API, not a hard error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that ENOTSUPPORTED would be a nice to have in some locations, including this one. The PR is already big enough as is right now, if you ask me, so I'd love to postpone this to another PR. Especially so as it might require reviewing of callers, as noted by @ethomson

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just wishful thinking 😉. Since it seems we find that a worthwhile error code, I'll try to create a PR for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical of run-time feature detection like this. Do we have something that isn't detectable at compile time and isn't just a missing feature or a bug?

return error;

if ((error = snapshot->open(snapshot, bh->level, bh->repo)) < 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@pks-t
Copy link
Member Author

pks-t commented Aug 24, 2018

Thanks for your feedback. I've fixed the stylistic issue spotted by @tiennou

@pks-t
Copy link
Member Author

pks-t commented Sep 7, 2018

Rebased to fix conflicts with #4799

Copy link
Member

@carlosmn carlosmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to split the refactoring into the different commits with good commit messages. It made looking through the changes leading up to the new code easy.

I've left a couple of comments but I really like how this ended up. Iteration via snapshot leading to a reparse sounds like an unintended side-effect but the idea there is to duplicate the values into a version we can own and that won't change so that works out to the same.

src/config.c Outdated

GIT_UNUSED(new_raw);

giterr_set(GITERR_CONFIG, "a file with the same level (%i) has already been added to the config", (int)(*old)->level);
giterr_set(GITERR_CONFIG, "a backend with the same level (%i) has already been added to the config", (int)(*old)->level);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if using "backend" here might be a bit too much in the weeds for a message that a user is bound to see. I don't have a better word right now, but it might be worth giving it a think or two.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about just using "configuration" instead of backend?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that sounds weird when reading the complete sentence. "a configuration with the same level (%i) already exists"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have "configuration at this level (%i) is already set" or "there is already configuration at this level (%i)" which parse better.

src/config_mem.c Outdated
GIT_UNUSED(line_len);

if (current_section) {
/* TODO: Once warnings lang, we should likely warn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: s/lang/land/

static int config_memory_lock(git_config_backend *backend)
{
GIT_UNUSED(backend);
return config_error_readonly();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to error out on locking, same as any other read-only configuration. It would be good to support snapshotting as it should be pretty easy and it'd allow for readers to be very sure that they can borrow a string. We can deal with that later, but ideally we'd see that before the next release.

src/config_mem.c Outdated
{
GIT_UNUSED(out);
GIT_UNUSED(backend);
return config_error_readonly();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erroring out because it's read-only is the wrong thing to do semantically. This should be an error about this particular backend not supporting snapshots.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I've been too eager to just reuse that nifty function.

Also referring to the previous comment of yours, I intend to create a follow-up PR as soon as this is created. Snapshotting is trivial to implement, as you said, but I first wanted to get going the refactoring as well as basic functionality. I definitely intend to implement snapshotting and potentially even write support for the next release

When populating the list of submodule names, we use the submodule
configuration entry's name as the key in the map of submodule names.
This creates a hidden dependency on the liveliness of the configuration
that was used to parse the submodule, which is fragile and unexpected.

Fix the issue by duplicating the string before writing it into the
submodule name map.
The variables `git_config_escaped` and `git_config_escapes` are both
defined as static const character pointers in "config_parse.h". In case
where "config_parse.h" is included but those two variables are not being
used, the compiler will thus complain about defined but unused
variables. Fix this by declaring them as external and moving the actual
initialization to the C file.

Note that it is not possible to simply make this a #define, as we are
indexing into those arrays.
Originally, the `git_config` struct is a collection of all the parsed
configuration files from different scopes (system-wide config,
user-specific config as well as the repo-specific config files).
Historically, we didn't and don't yet have any other configuration
backends than the one for files, which is why the field holding the
config backends is called `files`. But in fact, nothing dictates that
the vector of backends actually holds file backends only, as they are
generic and custom backends can be implemented by users.

Rename the member to be called `backends` to clarify that there is
nothing specific to files here.
Same as with the previous commit, the `file_internal` struct is used to
keep track of all the backends that are added to a `git_config` struct.
Rename it to `backend_internal` and rename its `file` member to
`backend` to make the implementation more backend-agnostic.
@pks-t
Copy link
Member Author

pks-t commented Sep 21, 2018

Fixed comments noted by @carlosmn. Range-diff:

 1:  cbeecf478 =  1:  0b9c68b13 submodule: fix submodule names depending on config-owned memory
 2:  4991675af =  2:  b9affa329 config_parse: avoid unused static declared values
 3:  6ee37190b =  3:  633cf40cb config: rename `files` vector to `backends`
 4:  18f2b5013 =  4:  83733aeb0 config: rename `file_internal` and its `file` member
 5:  710dce363 !  5:  f40891447 config: make names backend-agnostic
    @@ -48,7 +48,7 @@
      	if (pos == -1) {
      		giterr_set(GITERR_CONFIG,
     -			"no config file exists for the given level '%i'", (int)level);
    -+			"no config backend exists for the given level '%i'", (int)level);
    ++			"no configuraiton exists for the given level '%i'", (int)level);
      		return GIT_ENOTFOUND;
      	}
      
    @@ -62,7 +62,7 @@
      	GIT_UNUSED(new_raw);
      
     -	giterr_set(GITERR_CONFIG, "a file with the same level (%i) has already been added to the config", (int)(*old)->level);
    -+	giterr_set(GITERR_CONFIG, "a backend with the same level (%i) has already been added to the config", (int)(*old)->level);
    ++	giterr_set(GITERR_CONFIG, "a configuration with the same level (%i) already exists", (int)(*old)->level);
      	return GIT_EEXISTS;
      }
      
 6:  f83f1acad =  6:  7aae03229 config: move function normalizing section names into "config.c"
 7:  0b69f5fc0 =  7:  7faddfac1 config: rename "config_file.h" to "config_backend.h"
 8:  d327db91c =  8:  f8a4d252e config_file: remove unnecessary snapshot indirection
 9:  305f8bac9 =  9:  7ffee7766 config_entries: pull out implementation of entry store
10:  32a7e720f = 10:  f1db78c22 config_entries: rename functions and structure
11:  f849c6748 = 11:  11d0dcb43 config_entries: abstract away retrieval of config entries
12:  57ed569cd = 12:  99dd96cdc config_entries: abstract away iteration over entries
13:  986817bbd = 13:  9f765748d config_entries: abstract away reference counting
14:  7e6920862 = 14:  f4392baa8 config_entries: internalize structure declarations
15:  013395dc1 = 15:  9ffce8fb4 config_entries: refactor entries iterator memory ownership
16:  839b7d887 ! 16:  1d8030a42 config: introduce new read-only in-memory backend
    @@ -83,7 +83,7 @@
     +	GIT_UNUSED(line_len);
     +
     +	if (current_section) {
    -+		/* TODO: Once warnings lang, we should likely warn
    ++		/* TODO: Once warnings land, we should likely warn
     +		 * here. Git appears to warn in most cases if it sees
     +		 * un-namespaced config options.
     +		 */
    @@ -205,7 +205,8 @@
     +{
     +	GIT_UNUSED(out);
     +	GIT_UNUSED(backend);
    -+	return config_error_readonly();
    ++	giterr_set(GITERR_CONFIG, "this backend does not support snapshots");
    ++	return -1;
     +}
     +
     +static void config_memory_free(git_config_backend *_backend)

@pks-t pks-t force-pushed the pks/config-mem branch 2 times, most recently from 1d8030a to c265062 Compare September 21, 2018 10:25
As a last step to make variables and structures more backend agnostic
for our `git_config` structure, rename local variables to not be called
`file` anymore.
The function `git_config_file_normalize_section` is never being used in
any file different than "config.c", but it is implemented in
"config_file.c". Move it over and make the symbol static.
The header "config_file.h" has a list of inline-functions to access the
contents of a config backend without directly messing with the struct's
function pointers. While all these functions are called
"git_config_file_*", they are in fact completely backend-agnostic and
don't care whether it is a file or not. Rename all the function to
instead be backend-agnostic versions called "git_config_backend_*" and
rename the header to match.
The implementation for config file snapshots has an unnecessary
redirection from `config_snapshot` to `git_config_file__snapshot`.
Inline the call to `git_config_file__snapshot` and remove it.
The configuration entry store that is used for configuration files needs
to keep track of all entries in two different structures:

- a singly linked list is being used to be able to iterate through
  configuration files in the order they have been found

- a string map is being used to efficiently look up configuration
  entries by their key

This store is thus something that may be used by other, future backends
as well to abstract away implementation details and iteration over the
entries.

Pull out the necessary functions from "config_file.c" and moves them
into their own "config_entries.c" module. For now, this is simply moving
over code without any renames and/or refactorings to help reviewing.
The previous commit simply moved all code that is required to handle
config entries to a new module without yet adjusting any of the function
and structure names to help readability. We now rename things
accordingly to have a common "git_config_entries" entries instead of the
old "diskfile_entries" one.
The code accessing config entries in the `git_config_entries` structure
is still much too intimate with implementation details, directly
accessing the maps and handling indices. Provide two new functions to
get config entries from the internal map structure to decouple the
interfaces and use them in the config file code.

The function `git_config_entries_get` will simply look up the entry by
name and, in the case of a multi-value, return the last occurrence of
that entry. The second function, `git_config_entries_get_unique`, will
only return an entry if it is unique and not included via another
configuration file. This one is required to properly implement write
operations for single entries, as we refuse to write to or delete a
single entry if it is not clear which one was meant.
The nice thing about our `git_config_iterator` interfaces is that nobody
needs to know anything about the implementation details. All that is
required is to obtain the iterator via any backend and then use it by
executing generic functions. We can thus completely internalize all the
implementation details of how to iterate over entries into the config
entries store and simply create such an iterator in our config file
backend when we want to iterate its entries. This further decouples the
config file backend from the config entries store.
Instead of directly calling `git_atomic_inc` in users of the config
entries store, provide a `git_config_entries_incref` function to further
decouple the interfaces. Convert the refcount to a `git_refcount`
structure while at it.
Access to the config entries is now completely done via the modules
function interface and no caller messes with the struct's internals. We
can thus completely move the structure declarations into the
implementation file so that nobody even has a chance to mess with the
members.
Right now, the config file code requires us to pass in its backend to
the config entry iterator. This is required with the current code, as
the config file backend will first create a read-only snapshot which is
then passed to the iterator just for that purpose. So after the iterator
is getting free'd, the code needs to make sure that the snapshot gets
free'd, as well.

By now, though, we can easily refactor the code to be more efficient and
remove the reverse dependency from iterator to backend. Instead of
creating a read-only snapshot (which also requires us to re-parse the
complete configuration file), we can simply duplicate the config entries
and pass those to the iterator. Like that, the iterator only needs to
make sure to free the duplicated config entries, which is trivial to do
and clears up memory ownership by a lot.
Now that we have abstracted away how to store and retrieve config
entries, it became trivial to implement a new in-memory backend by
making use of this. And thus we do so.

This commit implements a new read-only in-memory backend that can parse
a chunk of memory into a `git_config_backend` structure.
@pks-t
Copy link
Member Author

pks-t commented Sep 28, 2018

Amended once again to improve the error message. Seeing that the criticism only revolved around such minor things, can I assume that this is ready to be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants