Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add GIT_REPOSITORY_OPEN_FROM_ENV flag to respect $GIT_* environment vars #3711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

joshtriplett
Copy link
Contributor

@joshtriplett joshtriplett commented Mar 27, 2016

git_repository_open_ext provides parameters for the start path, whether to search across filesystems, and what ceiling directories to stop at. git commands have standard environment variables and defaults for each of those, as well as various other parameters of the repository. To avoid duplicate environment variable handling in users of libgit2, add a GIT_REPOSITORY_OPEN_FROM_ENV flag, which makes git_repository_open_ext automatically handle the appropriate environment variables. Commands that intend to act just like those built into git itself can use this flag to get the expected default behavior.

git_repository_open_ext with the GIT_REPOSITORY_OPEN_FROM_ENV flag respects $GIT_DIR, $GIT_DISCOVERY_ACROSS_FILESYSTEM, $GIT_CEILING_DIRECTORIES, $GIT_INDEX_FILE, $GIT_NAMESPACE, $GIT_OBJECT_DIRECTORY, and $GIT_ALTERNATE_OBJECT_DIRECTORIES. In the future, when libgit2 gets worktree support, git_repository_open_env will also respect $GIT_WORK_TREE and $GIT_COMMON_DIR; until then, git_repository_open_ext with this flag will error out if either $GIT_WORK_TREE or $GIT_COMMON_DIR is set.

cl_setenv("GIT_CEILING_DIRECTORIES", ceiling_dirs);
cl_git_pass(git_repository_discover_default(&found_path));
cl_setenv("GIT_DIR", NULL);
cl_setenv("GIT_CEILING_DIRECTORIES", NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably move these into the test cleanup code - that way they will get unset properly even if git_repository_discover_default fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@vmg
Copy link
Member

vmg commented Mar 28, 2016

This looks good to me. I like the careful error handling. :)

@joshtriplett joshtriplett force-pushed the git_repository_discover_default branch from 42b595b to 31c429e Compare March 29, 2016 01:44
@joshtriplett
Copy link
Contributor Author

Looks like one of the Travis builds spuriously failed. Retrying...

@joshtriplett joshtriplett force-pushed the git_repository_discover_default branch from 31c429e to 795309e Compare March 29, 2016 02:48
@joshtriplett
Copy link
Contributor Author

OK, I updated the test to add a cleanup that unsets the environment variables.

@carlosmn
Copy link
Member

This is very different from how the rest of the library operates. This would be the first place where we ourselves act on git environment variables so we need to be careful in how we introduce this kind of thing.

We do act on HOME and XDG_CONFIG_HOME but those are mandated by the git config rules and we only use them if the application hasn't set their own paths to use. This would be a wholly different way of acting.

This does not act on the GIT_DIR rules as I understand them to work. AFAIK when git finds GIT_DIR, it's supposed to set the git-dir path, not a base directory from which to search. This function could end up choosing a repository higher up the path while git would error out.

By defaulting to "." if there is no GIT_DIR, we use the current working directory as the base to search from, which is something we consciously avoid doing anywhere in the library since we can't trust the cwd is relevant for any particular directory.

Why stop at these variables? Why not look at GIT_WORK_TREE, GIT_INDEX_FILE or GIT_OBJECT_DIRECTORY if the aim is to behave as git would? As-is, this feels unfortunately a lot like a function for a specific usage of an application, rather than something that applies generally.

@joshtriplett
Copy link
Contributor Author

@carlosmn Building a command-line application similar to those provided by git itself seems like one of the primary use cases of libgit2. Such applications should, by default, act on the standard git environment variables the same way git does, so that they fit in. I absolutely do agree that libgit2 should never force the use of these environment variables, as that would break other use cases; however, it seems appropriate to me to provide a function that handles one of the most common use cases correctly, to avoid having to reimplement that functionality (potentially incorrectly) in each client. Right now, it's easier to ignore variables like $GIT_CEILING_DIRECTORIES than to implement support for them; I'd like to make it easier to use them than to ignore them, so that command-line tools built on libgit2 will likely respect them.

Anyone who wants to ignore all of these variables can still call git_repository_open_ext or git_repository_discover directly; those code paths should never look at these environment variables. That's why I added a separate function that explicitly does.

This does not act on the GIT_DIR rules as I understand them to work. AFAIK when git finds GIT_DIR, it's supposed to set the git-dir path, not a base directory from which to search. This function could end up choosing a repository higher up the path while git would error out.

Good catch; I can easily fix that. I can just pass GIT_REPOSITORY_OPEN_NO_SEARCH if $GIT_DIR was set. I'll update the patch and the tests to fix that.

By defaulting to "." if there is no GIT_DIR, we use the current working directory as the base to search from, which is something we consciously avoid doing anywhere in the library since we can't trust the cwd is relevant for any particular directory.

This function is explicitly supposed to be "open the default repository that a git command-line tool would open". So, it should match git's behavior of searching from the current directory up to the ceiling dirs. I agree that other functions in the library should not care about the current directory (though it's still possible to pass them relative paths), but in this case...

Why stop at these variables? Why not look at GIT_WORK_TREE, GIT_INDEX_FILE or GIT_OBJECT_DIRECTORY if the aim is to behave as git would?

I'd like to do so. $GIT_WORK_TREE and $GIT_INDEX_FILE seem trivial to handle, actually. I should change this function to git_repository_open_default instead of making it a variant of discover. Then, I can set the worktree and index before returning the repository. As far as I can tell, libgit2 doesn't provide any support for separated object directories, but if it ever does, $GIT_OBJECT_DIRECTORY should be similarly easy to support.

As-is, this feels unfortunately a lot like a function for a specific usage of an application, rather than something that applies generally.

This is intended to be the "give me the repository a git command-line tool would open" function; it should be general across all tools that behave like those git itself ships, which seems like a sufficiently common use case for libgit2. To the extent that this isn't sufficiently general, I can and should fix that. :)

@vmg
Copy link
Member

vmg commented Mar 29, 2016

I agree with @joshtriplett's rationale. As long as libgit2 doesn't take into account the environment variables by default, adding a helper API that mimic's Git's convention seems useful. The alternative is making many users of the library re-write the exact same logic. That's shitty for the users.

@ethomson care to weight in?

@ethomson
Copy link
Member

I'm mostly in agreement. I think that this is a useful addition to the library, provided it has git-like semantics (and it seems like we're all in agreement that it should).

This function is explicitly supposed to be "open the default repository that a git command-line tool would open".

I don't like calling this the "default repository", it makes it sound like there is a default directory, when this is really the repository based on the current working directory. This may not be a significant distinction if you're writing a CLI, but the concept of a "default repository" doesn't make much sense to anybody who's not.

git_repository_open_current? git_repository_open_from_cwd?

As far as I can tell, libgit2 doesn't provide any support for separated object directories, but if it ever does, $GIT_OBJECT_DIRECTORY should be similarly easy to support.

We're very flexible about object databases and this should be quite supportable with the existing code.

@carlosmn
Copy link
Member

This is intended to be the "give me the repository a git command-line tool would open" function

Sure, I do agree that's a useful functionality. But that's not what the function does. It only covers one small aspect. If you want to do that, you'd still need to reimplement the logic even with this function available. Which is why I said that it felt like it came out of a specific use-case.

Name-wise, I'd rather go with something like git_repository_open_fromenvironment() or similar, since it's not just the "cwd"; after all, that bit is trivial by passing "." as a path to git_repository_open(), it's all the other stuff.

I would still prefer it if it accepted the path to take as the basis instead of assuming you would always want the cwd (you can accept NULL as defaulting to getcwd()). The purpose of the library is not to replicate what git has done, that tool already exists, but to allow you to be in control of what happens in a git repository. Even if a common use-case is to behave like git, some of the environment variables would apply just as well to something which does want to access multiple repositories.

@joshtriplett
Copy link
Contributor Author

I'll update the function to take more of the git environment variables into account, as well as the correct $GIT_DIR handling, and rename it to git_repository_open_env.

@joshtriplett joshtriplett force-pushed the git_repository_discover_default branch from 795309e to 7846932 Compare April 3, 2016 07:54
@joshtriplett
Copy link
Contributor Author

Pushing a WIP version for review; I still need to write test cases. I've added support for all the standard git environment variables: $GIT_DIR, $GIT_DISCOVERY_ACROSS_FILESYSTEM, $GIT_CEILING_DIRECTORIES, $GIT_INDEX_FILE, $GIT_NAMESPACE, $GIT_OBJECT_DIRECTORY, and $GIT_ALTERNATE_OBJECT_DIRECTORIES. In the future, when libgit2 gets worktree support, git_repository_open_env will also respect $GIT_WORK_TREE and $GIT_COMMON_DIR; until then, git_repository_open_env errors out if either of those variables is set, to avoid surprises. I've also added a start_path parameter, with NULL meaning "use $GIT_DIR or search from "." if not set.

@linquize
Copy link
Contributor

linquize commented Apr 3, 2016

Do you want libgit2 to respect any environment variables?

@joshtriplett
Copy link
Contributor Author

Updated with extensive testcases, covering all the supported environment variables.

In the course of developing these test cases, I found a bug in the existing ceiling_dirs handling in find_repo, which led to pull request #3727. I based this branch on that one, as the tests need the same fix.

@joshtriplett
Copy link
Contributor Author

Any thoughts on this approach, either on the API or on this implementation of it?

if (odb)
git_repository_set_odb(repo, odb);

if (git_buf_is_allocated(&alts_buf)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that I understand this construct. git_buf_size(&alts_buf) > 0 might make more sense? But I think the best option is to move the git__gitenv down here to keep the same pattern as the above variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git_buf_is_allocated checks whether the buffer has been initialized.

In this function, I kept all the git__gitenv calls and associated errors together and above the git_repository_open_ext, so that any error obtaining the environment variable would fail early and not have to clean up the open repository. However, setting up alternatives requires the repository.

I can move the call, though; the order of error handling doesn't really matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my point about is_allocated is that it's not quite what you're interested in; it measures whether the library was the one to allocate a git_buf (as opposed to an external caller). It doesn't measure whether a git_buf has contents in it.

Are you preferring this because you want to support environment variables that are an empty string? If so this is effective but feels a bit fragile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting empty strings was part of it; I used is_allocated as a "has this been set by git__getenv" check. It returns false for GIT_BUF_INIT and true for the value produced by git__getenv. That doesn't seem fragile.

I think it'll be easiest to just move the git__getenv call.

@ethomson
Copy link
Member

I'm not in favor of the name git_repository_open_env, it makes it sound like it's opening the environment, rather than opening the repository while respecting the environment.

Something like git_repository_open_with_env or open_respecting_env would be better, but is sort of awkward still.

Why can't we simply make GIT_REPOSITORY_OPEN_RESPECT_ENVIRONMENT (or something perhaps less wordy) an option to open_ext?

@joshtriplett
Copy link
Contributor Author

Why can't we simply make GIT_REPOSITORY_OPEN_RESPECT_ENVIRONMENT (or something perhaps less wordy) an option to open_ext?

Something like this?

git_repository_open_ext(&repo, NULL, GIT_REPOSITORY_OPEN_RESPECT_ENV, NULL);

I could do that, but I think it'd complicate git_repository_open_ext with an almost completely independent chunk of logic. And it doesn't make sense to combine this flag with any of the existing three GIT_REPOSITORY_OPEN_* flags. I think I'd end up implementing this by checking that one flag at the top of git_repository_open_ext and calling a separate helper function that looks like the implementation in this commit (including the call to git_repository_open_ext without that flag).

The intention was to make this new function the obvious call to make if you want to open the same repository that the git command-line tool would open.

I can certainly rename the function; git_repository_open_with_env seems reasonable, as does git_repository_open_from_env. I don't really care what color the bikeshed is painted. :)

@ethomson
Copy link
Member

No, I was suggesting adding GIT_REPOSITORY_OPEN_FROM_ENVIRONMENT as a git_repository_open_flag_t, which would invoke this functionality when set (incompatible flags would be ignored).

This gives us flexibility in the future if we (for example) add some GIT_REPOSITORY_OPEN_... flag that should be used by both open_ext and by this new function. If we have a git_repository_open_from_env then we will have to break its compatibility to add a flags argument to cope with that new flag.

We could of course go ahead and take flags in your new function, but once we do, they have a pretty similar signature, and we could instead have only one function and have it just switch based on that flag.

Just a thought.

@joshtriplett
Copy link
Contributor Author

@ethomson I had that exact set of three flags in mind when I mentioned that none of them made sense together with this new one. But forward compatibility makes sense, sure. It does make the common case a bit less clean, but it still seems workable.

I'll add the new flag, and document the (non-)interaction between it and the other flags.

@ethomson
Copy link
Member

Yeah, it's tough to know whether this is really worthwhile or not (how much are we really going to add to repository_open, after all)?

(I think a totally reasonable thing is just to change the existing function and the new function to be internal and make open_ext a switch to the two. Don't know offhand if that really makes sense or not.)

git only checks ceiling directories when its search ascends to a parent
directory.  A ceiling directory matching the starting directory will not
prevent git from finding a repository in the starting directory or a
parent directory.  libgit2 handled the former case correctly, but
differed from git in the latter case: given a ceiling directory matching
the starting directory, but no repository at the starting directory,
libgit2 would stop the search at that point rather than finding a
repository in a parent directory.

Test case using git command-line tools:

/tmp$ git init x
Initialized empty Git repository in /tmp/x/.git/
/tmp$ cd x/
/tmp/x$ mkdir subdir
/tmp/x$ cd subdir/
/tmp/x/subdir$ GIT_CEILING_DIRECTORIES=/tmp/x git rev-parse --git-dir
fatal: Not a git repository (or any of the parent directories): .git
/tmp/x/subdir$ GIT_CEILING_DIRECTORIES=/tmp/x/subdir git rev-parse --git-dir
/tmp/x/.git

Fix the testsuite to test this case (in one case fixing a test that
depended on the current behavior), and then fix find_repo to handle this
case correctly.

In the process, simplify and document the logic in find_repo():
- Separate the concepts of "currently checking a .git directory" and
  "number of iterations left before going further counts as a search"
  into two separate variables, in_dot_git and min_iterations.
- Move the logic to handle in_dot_git and append /.git to the top of the
  loop.
- Only search ceiling_dirs and find ceiling_offset after running out of
  min_iterations; since ceiling_offset only tracks the longest matching
  ceiling directory, if ceiling_dirs contained both the current
  directory and a parent directory, this change makes find_repo stop the
  search at the parent directory.
GIT_REPOSITORY_OPEN_NO_SEARCH does not search up through parent
directories, but still tries the specified path both directly and with
/.git appended.  GIT_REPOSITORY_OPEN_BARE avoids appending /.git, but
opens the repository in bare mode even if it has a working directory.
To support the semantics git uses when given $GIT_DIR in the
environment, provide a new GIT_REPOSITORY_OPEN_NO_DOTGIT flag to not try
appending /.git.
git_repository_open_ext provides parameters for the start path, whether
to search across filesystems, and what ceiling directories to stop at.
git commands have standard environment variables and defaults for each
of those, as well as various other parameters of the repository. To
avoid duplicate environment variable handling in users of libgit2, add a
GIT_REPOSITORY_OPEN_FROM_ENV flag, which makes git_repository_open_ext
automatically handle the appropriate environment variables. Commands
that intend to act just like those built into git itself can use this
flag to get the expected default behavior.

git_repository_open_ext with the GIT_REPOSITORY_OPEN_FROM_ENV flag
respects $GIT_DIR, $GIT_DISCOVERY_ACROSS_FILESYSTEM,
$GIT_CEILING_DIRECTORIES, $GIT_INDEX_FILE, $GIT_NAMESPACE,
$GIT_OBJECT_DIRECTORY, and $GIT_ALTERNATE_OBJECT_DIRECTORIES.  In the
future, when libgit2 gets worktree support, git_repository_open_env will
also respect $GIT_WORK_TREE and $GIT_COMMON_DIR; until then,
git_repository_open_ext with this flag will error out if either
$GIT_WORK_TREE or $GIT_COMMON_DIR is set.
@joshtriplett joshtriplett force-pushed the git_repository_discover_default branch from 86ee304 to 0dd98b6 Compare June 24, 2016 19:29
@joshtriplett joshtriplett changed the title API to discover repository respecting standard environment variables Add GIT_REPOSITORY_OPEN_FROM_ENV flag to respect $GIT_* environment vars Jun 24, 2016
@ethomson
Copy link
Member

I think this makes sense - @carlosmn do you like the way this turned out?

@ethomson
Copy link
Member

Okay, one last request - would you mind documenting this in the CHANGELOG?

@joshtriplett
Copy link
Contributor Author

@ethomson Will do.

@joshtriplett
Copy link
Contributor Author

@ethomson Done.

@ethomson ethomson merged commit ebeb56f into libgit2:master Jul 1, 2016
@joshtriplett joshtriplett deleted the git_repository_discover_default branch July 1, 2016 22:52
max630 pushed a commit to max630/libgit2 that referenced this pull request Nov 9, 2016
…over_default

Add GIT_REPOSITORY_OPEN_FROM_ENV flag to respect $GIT_* environment vars
(cherry picked from commit ebeb56f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants