In-memory configuration #4767

pks-t · 2018-08-16T10:40:37Z

So finally I hold true to my promise to refactor our config code (once again) and create an in-memory configuration backend. I think the result is quite pleasant due to the refactorings, and in fact the in-memory backend implementation is only a bit more than a hundred lines of code (excluding the stubs).

This series has three phases:

Refactorings to make our config file more generic with regards to names. Most importantly, many things are renamed from "git_config_file_" to "git_config_backend_" to express that they are actually backend-agnostic.
I pull out the "config_entries" store from the "git_config_file" backend that abstracts iteration order and memory management for config entries.
I create the "git_config_mem" backend based on that store, which is rather trivial to do as everything is in place already.

There are a few more things I'd like to do after this PR is merged. First, I think that we can get rid of the file read-only snapshot implementation and simply use an in-memory backend instead. Second, I'll implement writing to and snapshotting of the in-memory config backend. Both things should be easy to implement, but the series is already long enough and thus I refrained from doing so now.

As always, I tried to make sure that the series "develops" to make a pleasant read. So I highly recommend to read it commit-wise instead of all at once.

tiennou

This is really nice ! I noticed a few minor stylistic things, but overall it looks pretty solid, so 👍.

tiennou · 2018-08-16T20:29:13Z

src/config_entries.c

 typedef struct config_entries_iterator {
 	git_config_iterator parent;
 	config_entry_list *head;
 } config_entries_iterator;

+struct git_config_entries{


nitpick: missing space before {

tiennou · 2018-08-16T20:34:32Z

src/config_mem.c

+static int config_memory_lock(git_config_backend *backend)
+{
+	GIT_UNUSED(backend);
+	return config_error_readonly();


I have a fuzzy recollection that lock/unlock can be used by transactions, and that snapshot can also be about having a consistent view of the configuration at some point in time (even when read-only).

I just want to point that out, though AFAICT this is mostly intended for testing purposes 😉. Support for those can be added later.

I wanted to keep this PR as short as possible while still providing basic functionality. I plan on implementing write support and snapshotting as soon as this is merged, especially so as both things should be rather trivial to implement. I didn't yet take a look at lock/unlock and transactions, but I'll do that at the same point in time, I guess.

It's fine to error out on locking, same as any other read-only configuration. It would be good to support snapshotting as it should be pretty easy and it'd allow for readers to be very sure that they can borrow a string. We can deal with that later, but ideally we'd see that before the next release.

tiennou · 2018-08-16T20:36:55Z

src/config_mem.c

+static int config_error_readonly(void)
+{
+	giterr_set(GITERR_CONFIG, "this backend is read-only");
+	return -1;


Still longing for a GIT_ENOTSUP…

Hmm, interesting. I'm game for an ENOTSUPPORTED or whatever, but I worry that it would get really weird really fast, at least here for config. Since configuration is so low-level, I worry that it would leak out very strangely. For example, if I called git_merge and got an GIT_ENOTSUPPORTED back then I would very much not understand what it was that wasn't supported.

I think? Is this actually the case that you had in mind that you do in fact want?

Well it could be mostly given as higher-level layers that the current thing they're trying to do can't work, that would allow that higher-level to preserve the semantics of what happened.

For example, we don't do octopus merges, that can be used to signal that to the user (instead of the generic error we're currently returning).
In the config case, the higher level could try other backends and either swallow the error if another one was able to handle the request or error (as currently done), or return ENOTSUPPORTED to be more specific about the failure (in case there's only one in-memory backend loaded, which the user is ultimately responsible for). Which makes it pretty close to the ENOTFOUND semantics some codepaths have.

This is pretty minor though, it's just that there are some places where it would maybe be useful to report to the user that they're facing a "limitation" of the API, not a hard error.

I agree that ENOTSUPPORTED would be a nice to have in some locations, including this one. The PR is already big enough as is right now, if you ask me, so I'd love to postpone this to another PR. Especially so as it might require reviewing of callers, as noted by @ethomson

It was just wishful thinking 😉. Since it seems we find that a worthwhile error code, I'll try to create a PR for it.

I'm skeptical of run-time feature detection like this. Do we have something that isn't detectable at compile time and isn't just a missing feature or a bug?

tiennou · 2018-08-16T20:39:20Z

src/config_file.c

 		return error;

-	if ((error = snapshot->open(snapshot, bh->level, bh->repo)) < 0)


pks-t · 2018-08-24T07:03:28Z

Thanks for your feedback. I've fixed the stylistic issue spotted by @tiennou

pks-t · 2018-09-07T06:48:24Z

Rebased to fix conflicts with #4799

carlosmn

Thanks for taking the time to split the refactoring into the different commits with good commit messages. It made looking through the changes leading up to the new code easy.

I've left a couple of comments but I really like how this ended up. Iteration via snapshot leading to a reparse sounds like an unintended side-effect but the idea there is to duplicate the values into a version we can own and that won't change so that works out to the same.

carlosmn · 2018-09-17T17:13:45Z

src/config.c


 	GIT_UNUSED(new_raw);

-	giterr_set(GITERR_CONFIG, "a file with the same level (%i) has already been added to the config", (int)(*old)->level);
+	giterr_set(GITERR_CONFIG, "a backend with the same level (%i) has already been added to the config", (int)(*old)->level);


I wonder if using "backend" here might be a bit too much in the weeds for a message that a user is bound to see. I don't have a better word right now, but it might be worth giving it a think or two.

What about just using "configuration" instead of backend?

No, that sounds weird when reading the complete sentence. "a configuration with the same level (%i) already exists"?

We could have "configuration at this level (%i) is already set" or "there is already configuration at this level (%i)" which parse better.

carlosmn · 2018-09-17T17:49:27Z

src/config_mem.c

+	GIT_UNUSED(line_len);
+
+	if (current_section) {
+		/* TODO: Once warnings lang, we should likely warn


Typo: s/lang/land/

carlosmn · 2018-09-17T17:51:07Z

src/config_mem.c

+static int config_memory_lock(git_config_backend *backend)
+{
+	GIT_UNUSED(backend);
+	return config_error_readonly();


It's fine to error out on locking, same as any other read-only configuration. It would be good to support snapshotting as it should be pretty easy and it'd allow for readers to be very sure that they can borrow a string. We can deal with that later, but ideally we'd see that before the next release.

carlosmn · 2018-09-17T17:52:09Z

src/config_mem.c

+{
+	GIT_UNUSED(out);
+	GIT_UNUSED(backend);
+	return config_error_readonly();


Erroring out because it's read-only is the wrong thing to do semantically. This should be an error about this particular backend not supporting snapshots.

True. I've been too eager to just reuse that nifty function.

Also referring to the previous comment of yours, I intend to create a follow-up PR as soon as this is created. Snapshotting is trivial to implement, as you said, but I first wanted to get going the refactoring as well as basic functionality. I definitely intend to implement snapshotting and potentially even write support for the next release

When populating the list of submodule names, we use the submodule configuration entry's name as the key in the map of submodule names. This creates a hidden dependency on the liveliness of the configuration that was used to parse the submodule, which is fragile and unexpected. Fix the issue by duplicating the string before writing it into the submodule name map.

The variables `git_config_escaped` and `git_config_escapes` are both defined as static const character pointers in "config_parse.h". In case where "config_parse.h" is included but those two variables are not being used, the compiler will thus complain about defined but unused variables. Fix this by declaring them as external and moving the actual initialization to the C file. Note that it is not possible to simply make this a #define, as we are indexing into those arrays.

Originally, the `git_config` struct is a collection of all the parsed configuration files from different scopes (system-wide config, user-specific config as well as the repo-specific config files). Historically, we didn't and don't yet have any other configuration backends than the one for files, which is why the field holding the config backends is called `files`. But in fact, nothing dictates that the vector of backends actually holds file backends only, as they are generic and custom backends can be implemented by users. Rename the member to be called `backends` to clarify that there is nothing specific to files here.

Same as with the previous commit, the `file_internal` struct is used to keep track of all the backends that are added to a `git_config` struct. Rename it to `backend_internal` and rename its `file` member to `backend` to make the implementation more backend-agnostic.

pks-t · 2018-09-21T10:23:49Z

Fixed comments noted by @carlosmn. Range-diff:

 1:  cbeecf478 =  1:  0b9c68b13 submodule: fix submodule names depending on config-owned memory
 2:  4991675af =  2:  b9affa329 config_parse: avoid unused static declared values
 3:  6ee37190b =  3:  633cf40cb config: rename `files` vector to `backends`
 4:  18f2b5013 =  4:  83733aeb0 config: rename `file_internal` and its `file` member
 5:  710dce363 !  5:  f40891447 config: make names backend-agnostic
    @@ -48,7 +48,7 @@
      	if (pos == -1) {
      		giterr_set(GITERR_CONFIG,
     -			"no config file exists for the given level '%i'", (int)level);
    -+			"no config backend exists for the given level '%i'", (int)level);
    ++			"no configuraiton exists for the given level '%i'", (int)level);
      		return GIT_ENOTFOUND;
      	}
      
    @@ -62,7 +62,7 @@
      	GIT_UNUSED(new_raw);
      
     -	giterr_set(GITERR_CONFIG, "a file with the same level (%i) has already been added to the config", (int)(*old)->level);
    -+	giterr_set(GITERR_CONFIG, "a backend with the same level (%i) has already been added to the config", (int)(*old)->level);
    ++	giterr_set(GITERR_CONFIG, "a configuration with the same level (%i) already exists", (int)(*old)->level);
      	return GIT_EEXISTS;
      }
      
 6:  f83f1acad =  6:  7aae03229 config: move function normalizing section names into "config.c"
 7:  0b69f5fc0 =  7:  7faddfac1 config: rename "config_file.h" to "config_backend.h"
 8:  d327db91c =  8:  f8a4d252e config_file: remove unnecessary snapshot indirection
 9:  305f8bac9 =  9:  7ffee7766 config_entries: pull out implementation of entry store
10:  32a7e720f = 10:  f1db78c22 config_entries: rename functions and structure
11:  f849c6748 = 11:  11d0dcb43 config_entries: abstract away retrieval of config entries
12:  57ed569cd = 12:  99dd96cdc config_entries: abstract away iteration over entries
13:  986817bbd = 13:  9f765748d config_entries: abstract away reference counting
14:  7e6920862 = 14:  f4392baa8 config_entries: internalize structure declarations
15:  013395dc1 = 15:  9ffce8fb4 config_entries: refactor entries iterator memory ownership
16:  839b7d887 ! 16:  1d8030a42 config: introduce new read-only in-memory backend
    @@ -83,7 +83,7 @@
     +	GIT_UNUSED(line_len);
     +
     +	if (current_section) {
    -+		/* TODO: Once warnings lang, we should likely warn
    ++		/* TODO: Once warnings land, we should likely warn
     +		 * here. Git appears to warn in most cases if it sees
     +		 * un-namespaced config options.
     +		 */
    @@ -205,7 +205,8 @@
     +{
     +	GIT_UNUSED(out);
     +	GIT_UNUSED(backend);
    -+	return config_error_readonly();
    ++	giterr_set(GITERR_CONFIG, "this backend does not support snapshots");
    ++	return -1;
     +}
     +
     +static void config_memory_free(git_config_backend *_backend)

As a last step to make variables and structures more backend agnostic for our `git_config` structure, rename local variables to not be called `file` anymore.

The function `git_config_file_normalize_section` is never being used in any file different than "config.c", but it is implemented in "config_file.c". Move it over and make the symbol static.

The header "config_file.h" has a list of inline-functions to access the contents of a config backend without directly messing with the struct's function pointers. While all these functions are called "git_config_file_*", they are in fact completely backend-agnostic and don't care whether it is a file or not. Rename all the function to instead be backend-agnostic versions called "git_config_backend_*" and rename the header to match.

The implementation for config file snapshots has an unnecessary redirection from `config_snapshot` to `git_config_file__snapshot`. Inline the call to `git_config_file__snapshot` and remove it.

The configuration entry store that is used for configuration files needs to keep track of all entries in two different structures: - a singly linked list is being used to be able to iterate through configuration files in the order they have been found - a string map is being used to efficiently look up configuration entries by their key This store is thus something that may be used by other, future backends as well to abstract away implementation details and iteration over the entries. Pull out the necessary functions from "config_file.c" and moves them into their own "config_entries.c" module. For now, this is simply moving over code without any renames and/or refactorings to help reviewing.

The previous commit simply moved all code that is required to handle config entries to a new module without yet adjusting any of the function and structure names to help readability. We now rename things accordingly to have a common "git_config_entries" entries instead of the old "diskfile_entries" one.

The code accessing config entries in the `git_config_entries` structure is still much too intimate with implementation details, directly accessing the maps and handling indices. Provide two new functions to get config entries from the internal map structure to decouple the interfaces and use them in the config file code. The function `git_config_entries_get` will simply look up the entry by name and, in the case of a multi-value, return the last occurrence of that entry. The second function, `git_config_entries_get_unique`, will only return an entry if it is unique and not included via another configuration file. This one is required to properly implement write operations for single entries, as we refuse to write to or delete a single entry if it is not clear which one was meant.

The nice thing about our `git_config_iterator` interfaces is that nobody needs to know anything about the implementation details. All that is required is to obtain the iterator via any backend and then use it by executing generic functions. We can thus completely internalize all the implementation details of how to iterate over entries into the config entries store and simply create such an iterator in our config file backend when we want to iterate its entries. This further decouples the config file backend from the config entries store.

Instead of directly calling `git_atomic_inc` in users of the config entries store, provide a `git_config_entries_incref` function to further decouple the interfaces. Convert the refcount to a `git_refcount` structure while at it.

Access to the config entries is now completely done via the modules function interface and no caller messes with the struct's internals. We can thus completely move the structure declarations into the implementation file so that nobody even has a chance to mess with the members.

Right now, the config file code requires us to pass in its backend to the config entry iterator. This is required with the current code, as the config file backend will first create a read-only snapshot which is then passed to the iterator just for that purpose. So after the iterator is getting free'd, the code needs to make sure that the snapshot gets free'd, as well. By now, though, we can easily refactor the code to be more efficient and remove the reverse dependency from iterator to backend. Instead of creating a read-only snapshot (which also requires us to re-parse the complete configuration file), we can simply duplicate the config entries and pass those to the iterator. Like that, the iterator only needs to make sure to free the duplicated config entries, which is trivial to do and clears up memory ownership by a lot.

Now that we have abstracted away how to store and retrieve config entries, it became trivial to implement a new in-memory backend by making use of this. And thus we do so. This commit implements a new read-only in-memory backend that can parse a chunk of memory into a `git_config_backend` structure.

pks-t · 2018-09-28T09:14:58Z

Amended once again to improve the error message. Seeing that the criticism only revolved around such minor things, can I assume that this is ready to be merged?

pks-t force-pushed the pks/config-mem branch from ea18b46 to 1e72055 Compare August 16, 2018 12:15

tiennou approved these changes Aug 16, 2018

View reviewed changes

pks-t force-pushed the pks/config-mem branch from 2106e3e to 8d90be2 Compare August 24, 2018 07:02

pks-t force-pushed the pks/config-mem branch from 8d90be2 to 590c4cf Compare August 24, 2018 07:03

pks-t mentioned this pull request Aug 24, 2018

Add a fuzzer for config files #4752

Merged

pks-t force-pushed the pks/config-mem branch from 590c4cf to 839b7d8 Compare September 7, 2018 06:47

carlosmn reviewed Sep 17, 2018

View reviewed changes

pks-t added 4 commits September 21, 2018 12:11

pks-t force-pushed the pks/config-mem branch 2 times, most recently from 1d8030a to c265062 Compare September 21, 2018 10:25

pks-t added 12 commits September 28, 2018 11:14

config: make names backend-agnostic

a556269

As a last step to make variables and structures more backend agnostic for our `git_config` structure, rename local variables to not be called `file` anymore.

config: move function normalizing section names into "config.c"

1aeff5d

The function `git_config_file_normalize_section` is never being used in any file different than "config.c", but it is implemented in "config_file.c". Move it over and make the symbol static.

config_file: remove unnecessary snapshot indirection

d75bbea

The implementation for config file snapshots has an unnecessary redirection from `config_snapshot` to `git_config_file__snapshot`. Inline the call to `git_config_file__snapshot` and remove it.

config_entries: abstract away reference counting

123e596

Instead of directly calling `git_atomic_inc` in users of the config entries store, provide a `git_config_entries_incref` function to further decouple the interfaces. Convert the refcount to a `git_refcount` structure while at it.

pks-t force-pushed the pks/config-mem branch from c265062 to 2be39ce Compare September 28, 2018 09:14

carlosmn approved these changes Sep 28, 2018

View reviewed changes

carlosmn merged commit 0530d7d into libgit2:master Sep 28, 2018

pks-t deleted the pks/config-mem branch October 4, 2018 07:52

stanhu mentioned this pull request Feb 28, 2019

Stale Rugged::Config#each_key values if another process changes .git/config libgit2/rugged#785

Open

snyk-bot mentioned this pull request Feb 23, 2020

[Snyk] Upgrade nodegit from 0.4.1 to 0.26.4 saurabharch/Breezeblocks#1

Open

snyk-bot mentioned this pull request Apr 22, 2020

[Snyk] Upgrade nodegit from 0.24.3 to 0.26.5 aminatakonate000/Graviton-App#4

Open

snyk-bot mentioned this pull request May 5, 2020

[Snyk] Upgrade nodegit from 0.24.3 to 0.26.5 Barnstorm-Online/ngp-openapi-generator#1

Open

		return error;

		if ((error = snapshot->open(snapshot, bh->level, bh->repo)) < 0)

In-memory configuration #4767

In-memory configuration #4767

Uh oh!

Conversation

pks-t commented Aug 16, 2018

Uh oh!

tiennou left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pks-t commented Aug 24, 2018

Uh oh!

pks-t commented Sep 7, 2018

Uh oh!

carlosmn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pks-t commented Sep 21, 2018

Uh oh!

pks-t commented Sep 28, 2018

Uh oh!

Uh oh!