-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Correctly return matched pathspec when passing "*" or "." #1367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hmm. I really don't want to call To do what I'm describing, we would leave in the is_interesting check, but the only uninteresting pathspecs would be cases where the pathspec is NULL or consists only of NULL or empty strings. All other cases would be considered interesting, but when we parse "*" and "." we would mark them as uninteresting to match and would immediately consider them a match when testing. One concern I have with the behavior you are creating in libgit2sharp is that using a pathspec like |
Thanks for your proposal, really elegant as always! I will have a go at implementing it in the next few days.
Yes, it does! Actually, this is a behaviour I already identified, and which is covered by a (currently) failing test. My initial naive proposal would have been to modify the However, I now realize that it might have an undesirable side-effect on performance. Would it be OK to add this behaviour, but deactivated by default, and have a flag to activate it (something like |
Not to put to fine a point on it, but in my mind, the need to do something like that points to this being a misfeature. I don't have much knowledge of the C# API design esthetic, so I've tried to stay out of this, but the idea that you would run a potentially expensive fnmatch call over every single pathspec entry for every single file just to raise an exception because one might not be used feels to me like taking this too far. I'd love to go back to the rationale that spawned this feature. I think the column "Ignore Unmatched Pathspec" is intertwining the cases of pathspecs with wildcards and pathspecs without wildcards. If you provide a list of 10 filenames to be staged and one doesn't match, then it seems reasonable to me that that could be an error, but as soon as you start injecting wildcard matches into the list, I think you are getting onto fairly shaky ground. Interestingly, you can fix some cases of this problem inside libgit2sharp without adding the "notify all matched" behavior by sorting the pathspec from most specific to least specific (i.e. must items with no wildcards first, items with a mix next, and items that are all wildcards at the end). There are two problems with that:
Would you consider going back to the original requirements for raising an exception and distinguishing between the wildcards vs no wildcards cases? Or maybe three cases: no wildcards, one wildcard (i.e. cannot have a conflicting match so simply putting the wildcard item at the end as a catchall will guarantee consistency), and multiple wildcards, where the third case would not enforce that all pathspecs must have a match. /cc @jamill @nulltoken |
@arrbee - I agree with this paragraph. I think it is reasonable to treat wildcards differently than explicitly named files, especially if it is expensive. IIRC, The original case was that the consumer was attempting to stage / unstage an file, the call succeeded, but the file was not staged. At least in that case, we could detect that the caller wanted to act on a specific file (but there was no matching file). |
Hey @yorah - I just wanted to say that I certainly didn't mean my comment as any critique of your work. All the code you've been writing, etc., has been of fine quality! I just wanted to steer to conversation back to why we were heading in this directory. I hope it didn't come across too negatively! |
@arrbee don't worry, this is actually the opposite! I didn't answer yet because as usual, your comments made me think about new aspects of the situation that I wasn't seeing before. I have the feeling like you are always Again, thanks a lot for your comments (on this PR and others), and for your time and patience! |
Thanks @jamill @arrbee for your comments. As you both said, the original requirement was to be able to distinguish specific named files vs wildchar pathspecs. If I understood correctly, this is exactly what you said:
The end result is that it only makes sense raising an exception (and notifying of matched/unmatched pathspecs) when the client passes non-pathspecs to the Thus, my proposal is to say that when the user passes a That way, notification and exception throwing will only be used with explicit file names. Examples If the index contains the following files: readme.txt, readmectxt
So, what about this issue? Well, if you (@jamill @nulltoken @arrbee) think the behavior described above makes sense, then we obviously don't need anything new in libgit2. However, for the sake of having a "more consistent" behavior, maybe it would be a good idea to implement your proposal (adding a flag to the Please tell me if you want me to implement it or if you prefer to leave the existing code as is! |
Late to this thread, but it strikes me as off to have pathspec behavior change completely based on superficially unrelated parameters. I have no idea if this is feasible, but could the failure and/or unmatched callback handling first check if the offending pathspec matches any of the files already found? If so, consider it redundant and move along. From a trivial test, that seems to be how |
To add another datapoint, |
@jamill that's interesting - I think that's msysgit specific beheavior. On Unix - where your shell expands wildcards - your shell would give you an error if
If it didn't,
So, what happens if I try to stage a file with a |
Whee! |
|
Maybe this is the point you are trying to make, but this sounds like a libgit2sharp design issue. Right now, the core library will tell you for each file in a diff the first pathspec that it found in the list that matched that file. The behavior is stable.
I'm open to exposing the pathspec checking facility as a utility API in libgit2 if it will help create the behavior you want in libgit2sharp. If you want to iterate over the unmatched pathspec collection and for each item iterate over every file in the diff, you can do that. However, I still don't think that will give you the behavior you want... Now that I think about it, this points out a fundamental flaw in the implementation so far. If you give a pathspec Oh, maybe you are always using |
At a higher level, I don't believe libgit2 should be spending time replicating shell functionality to the extent that we can avoid it. For one thing, different shells behave differently, and for another, it muddies the separation of concerns. For example, the diff APIs don't take strings to specify the trees to diff (with an implicit rev-parse), they take the tree objects. If you want to parse specs and pass them in, we have an explicit function for that. Just so, I don't think diff should try to recreate various shell error conditions for pathspec expansion. I started to write more about ideas that I have, but I think there is a lot for you folks to discuss still about the behavior you want to achieve. For example, if the caller tries to diff with "*.baz" and yet "file.baz" is either untracked or ignored, is that an error? A shell match will find those files (and hence no error even if you are in a shell that does error for no matches - which mine does not) but there will be no entry in the diff list (again, unless you are always using INCLUDE_UNTRACKED, INCLUDE_UNMODIFIED, and INCLUDE_IGNORED, which feels like a pretty expensive choice to make on your user's behalf to me). |
The implementation proposed in libgit2/libgit2sharp#343 is two-fold:
It means that when we don't notify/throw, we "only" have the overhead of untracked/ignored files. Not sure yet about @dahlbyk proposal. I thought of a case where it was not working earlier, but I didn't write it down, and can't find it now... Will try to find it back. @dahlbyk maybe the |
I just had an interesting idea that might solve this problem... When I was working on the file similarity metric, I ended up writing a pluggable similarity metric API so callers could experiment with alternative ways of comparing files (because it's an interesting problem). What do you think about implementing a pluggable file filtering algorithm for diff (and eventually for status, etc). It would look something like: typedef struct {
/* process options and create a filter object */
int (*filter_setup)(void **filter, const void *options_struct);
/* release the filter object */
int (*filter_free)(void *filter);
/* given a filter object, suggest start and end paths for iteration */
int (*filter_suggest_bounds)(char **iterstart, char **iterend, void *filter, void *payload);
/* given a filter and file info, return 0 if matches, > 1 if no match, < 0 error/stop */
int (*filter_match)(void *filter, const git_index_entry *file, void *payload);
void *payload;
} git_diff_filter;
typedef struct {
...
git_strarray pathspec;
git_diff_filter *filter; /* if NULL, internal fnmatch filter will be used */
} git_diff_options; Now, if you pass the pluggable filter as NULL, we will just use the internal implementation which will look at the flags and the pathspec from the What I like about this is that you can take it further, if you want, and implement filtering just for small files or just for files with executable bits set or whatever rule you want to narrow your diff to a particular set of files. This would not supplant the notifications that already exist because those are a post-match operation that lets you incrementally monitor what files are going into a diff list. By the way, the reason that I wrote the API to take the |
@arrbee Sorry for not answering before, I actually got caught up playing with the pluggable file filtering idea that you had (and looking at the pluggable similarity API that you did)! There are still some things in your proposals that I don't understand clearly (
I don't like this part, as libgit2sharp is supposed to be cross-platform. There is no out-of-the-box Anyway, I will get a push ready with what I have in the next few days. |
Sorry, that probably could have used some more explanation... The internal iterators take a start of range and end of range string prefix so that they don't have to iterate over the entire hierarchy when you are using a narrow pathspec. For example, if you give a pathspec of Regarding exposing fnmatch, I suppose we could do so. It is a slippery slope, I guess, between exposing fnmatch to exposing the current pathspec internal API (where there is a pre-match spec parsing phase separate from the actual match operation). At some point, you're just "pluggin in" for the purpose of creating an Observer wrapper to the process, at which point we may as well expose the default plugin implementation and support that behavior directly without necessarily exposing the component APIs. Did that last paragraph make sense? I may need more caffeine. I'm worried that I didn't state things clearly... If you like this direction, let me know. I'd be happy to take a stab at encapsulating the current behavior in such a pluggable API, if you like, and then you could take it and see if it extended naturally to cover the problem you want to solve (or you could just write the whole thing, but I don't what you to feel like to have to do that all by yourself if you don't want to / don't have time). |
Here is a first spike of the plugin file filtering thingie. Basically, the plugin infrastructure is there, and the existing behaviour has been encapsulated into it. All existing tests seem to pass. To be honest, this is mainly due to sheer luck. This is far from being finished:
Well, to sum it up, this is really just a spike, to keep you updated about my progress (and so you can tell me if I'm going in completely the wrong direction ;). If you don't have time to take a peek, this is also allright, I will push an update in 1-2 days. |
Mmm, I also added 2 failing tests related to passing "." as pathspecs. I'm not sure yet what to do with them, and if we should do anything at all. |
Agreed. I will spark some more discussion on the libgit2sharp issues to see what can be considered a "stable" behaviour, and if we can find a way so that we don't need it. Let's forget about that for now.. |
I think I'm ready for a first review, whenever someone has time! Since last comments: no more leaks, works with status/checkout (no tests on that yet), a few tests showing how to implement a custom filter. |
Ok, it's time for some update on this PR. Since the merge of libgit2/libgit2sharp#343, we finally won't be needing to report all matched pathspecs (because we only care about reporting unmatched explicit paths). It means that for libgit2sharp at least, we don't need the plugable filter mechanism. What does it mean for this PR:
|
That was fast, but there are a couple warnings and a phat segfault. :) |
Let's see if it's better now :) Edit: Ack, still some warnings... |
Much better now. |
Looking great. @arrbee: pluggable filter, yay or nay? |
I agree with @vmg that this is looking great, @yorah! I lean towards dropping a29aa9f (i.e. the filtering plugin) if we don't know of a use for it. It seems like added complexity with unclear immediate benefit. Regarding the failing tests, I'm wondering if the pathspec matcher should special case "." and know to skip over "./" before initiating a match. However, let's track that as a separate issue and get this PR merged! |
Yeah, I'm having a hard time picturing practical usage for the filtering API. Let's drop that last commit then and I'll merge the PR. |
Cool, I will drop the two last commits tomorrow then, and open up a separate issue for the failing test! |
I also moved all tests related to notifying in their own file.
Removed last 2 commits, and rebased on top of vNext 😃 |
Correctly return matched pathspec when passing "*" or "."
Thank you! Looking great! |
Correctly return matched pathspec when passing "*" or "."
This issue was detected during implementation of libgit2/libgit2sharp/pull/343
Context
git_pathspec_init
is the function used to initialize agit_vector
of pathspecs from a list of pathspecs passed from the client. However, if the list of pathspecs passed by the client is not deemed interesting (through the use ofgit_pathspec_is_interesting
), then an empty vector is initialized.What is considered an uninteresting list of pathspec?
(!str || !str[0] || (!str[1] && (str[0] == '*' || str[0] == '.')))
Later on, when the vector of pathspecs is used in
git_pathspec_match_path
, one of the early exit branch isThis means that we consider the pathspec a match, but we don't return which pathspec matched (it could be either '*' or '.') in the
matched_pathspec
out parameter.What does it mean for libgit2sharp (and other clients?)
When passing just '*' or '.', the diff process does not correctly notify which pathspec matched the diffed file (notifying was introduced in #1249).
Ways to fix it
If it can help, I'm willing to tackle this issue, but I would like to check first with you if this is a valid issue (we could also say it's up to the clients to handle those cases, even if it's still a bit painful IMO), and if yes, what is the preferred way to solve it.
Here are the ways I can think of:
git_pathspec_is_interesting
shortcut. It means removing an existing optimization, but I don't think this will have an impact on performance (no numbers to back up this claim, tell me if you want/need some)