Thanks to visit codestin.com
Credit goes to github.com

Skip to content

diff: Add a callback to notify of diffed files #1249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 8, 2013
Merged

diff: Add a callback to notify of diffed files #1249

merged 2 commits into from
Feb 8, 2013

Conversation

yorah
Copy link
Contributor

@yorah yorah commented Jan 16, 2013

Should help implementing the optional "Ignore unmatched pathspec" behavior in libgit2/libgit2sharp#274

@yorah
Copy link
Contributor Author

yorah commented Jan 16, 2013

@arrbee Current implementation relies on adding a bool in attr_file.h:git_attr_fnmatch. It solves the issue, but maybe there is a better way to do it?

@arrbee
Copy link
Member

arrbee commented Jan 16, 2013

Hmm, I am definitely not a fan of passing back data in the git_attr_fnmatch structure. I feel like from a code structure standpoint, those are intended to be constant structs after the initial parsing. That being said, I definitely want to be pragmatic about this.

Reading through libgit2/libgit2sharp#274 I'm not completely clear on what you're trying to accomplish here. This is diff and so if you are just giving a list of filenames to diff, afterwards it should be pretty easy to see which ones are in the diff and which ones are not. If you are really using the pattern matching feature, I don't yet understand why you would need to know that ".c" matched and ".h" didn't match. I'm sorry if I'm being thick headed, but can you help me understand what this will be useful for?

@nulltoken
Copy link
Member

@arrbee this is supposed to be used with the LibGit2Sharp Stage()/Unstage()/... methods. Those ones accepts a list of pathspecs. Behind the scene, they perform a diff against the index. The issue is that when passing a list of pathspecs to them, some may match something, some may not, and the current diff implementation "swallows" unmatched pathspecs.

I think it might be useful to inform the caller that some of the pathspecs haven't been matched, letting him/her decide if this should be considered as a fatal issue or not.

For instance, this feature could also be leveraged by higher level functions such as checkout or the proposed enhancement of reset in #1190.

@vmg
Copy link
Member

vmg commented Jan 16, 2013

Sorry, I don't see the use-case either. Can you elaborate on what you're trying to accomplish? Why would unmatched pathspecs be meaningful, or a reason for fatal failure?

@arrbee
Copy link
Member

arrbee commented Jan 16, 2013

@nulltoken When you say that we eat pathspecs that don't match, are you thinking of exact matches (i.e. you're using the pathspec as a list of file names) or are you actually using more complicated match patterns?

Also, what do you folks think about the notification callback changes that I made to the checkout code? I only bring that up, because I'm very open to extending diff with support for a similar type of notification / observation callback that notifies about files & trees being examined, and could potentially include the specific pathspec entry that did match (i.e. the positive of pathspec usage, as opposed to the negative).

@nulltoken
Copy link
Member

@vmg Stage() in LibGit2Sharp is in charge of promoting to the index the addition/modification/removal of files that happened in the workdir. There's already a diff based proof of concept of it in nulltoken/topic/diff-based-staging branch.

The process performs a diff between the index and the workdir, and for each delta, depending on the status, adds/replaces an entry in the index or removes it.

Currently, when being passed exact or match patterns that doesn't fit anything in either the workdir or the index, the diffing process just ignores the pathspecs.

From what I've seen:

  • git add cringes when being passed a pathspec that doesn't exist.
  • git rm exposes an --ignore-unmatch parameter

Provided the diffing mechanism was able to provide information about unmatched pathspecs, the idea would be to combine those two options and let the consumer defines the expected behavior through:

  • a boolean should_fail_on_unmatched_pathspec: When set to true, the process would stop before messing with the index. When set to false, the method would stage as many files as possible/encountered.
  • an optional callback unmatched_pathspecs which would enumerate every pathspecs the diff hasn't been able to resolve.

This would allow the caller to define how "safe", "lax" or "blind" he'd want the method to operate. The callback payload could be used for logging purpose, or notifying the user or even build a exception with a list of wrong pathspecs.

For what I foresee, Unstage(), Remove() and Index.Reset() would expose similar parameters. In my dreams, ultimately, I'd even like Checkout() to join this league.

In my mind, this would allow the consumer, depending of the kind of usage he makes of LibGit2Sharp to rely on a unified and as flexible as possible API when performing workdir/index interactions.

In most of the LibGit2Sharp tests the pathspecs are exact matches, but this also could be a mix of exact and match patterns being passed in the same batch.

I hope I've been able to describe as clearly as possible what I'm after and why this kind of negative callback mechanism might be interesting. If that's not the case, please just let me know and I'll do my best to better explain myself .

@yorah
Copy link
Contributor Author

yorah commented Jan 17, 2013

and could potentially include the specific pathspec entry that did match (i.e. the positive of pathspec usage, as opposed to the negative).

Works for me, we would have enough information in LibGit2Sharp to determine which pathspecs were not matched.

Also, what do you folks think about the notification callback changes that I made to the checkout code?

It looks prettier than what I did 😄
If you are OK with me trying, I will update this PR with your proposal. From what I can already see, I will have to modify the git_pathspec_match_path signature to be able to know which pathspec matched the passed path (as we pass a git_vector of pathspecs to this method).

Something strange just crossed my mind: if we push the notification thing a little bit, won't we end up building the git_diff_list (in git_diff__from_iterators and sub-functions) in a "streamed" way? I mean that the deltas property of the git_diff_list could be returned on a one-by-one basis to the client?
I'm not sure this is a good thing, but what I am saying is that between notifying "hey, I'm inspecting this file", and "hey, this file was inspected, and it's GIT_DELTA_ADDED", the line is thin.

@nulltoken
Copy link
Member

and could potentially include the specific pathspec entry that did match

Duh. I overlooked this 😊
@yorah's right. This may indeed work.

@arrbee
Copy link
Member

arrbee commented Jan 17, 2013

@yorah If you're willing to take this on, I'd be happy to see this PR evolve into a notifier-based approach!

Regarding the new signature for git_pathspec_match_path, it seems to me like you could change the return value from bool to int and return the index of the matched pathspec or -1 if there was no match. It won't be a large change to the underlying code and then the places that use this will just checkout for < 0 to test a failed match.

The key will probably be getting the correct signature for the notifier callback function. I'm not sure what it should be, but I look forward to your take on it.

It is interesting that you could potentially use the notifications to get the diff data incrementally instead of all at the end. Of course, the prevents later steps such as rename detection, but if those aren't relevant to you, then maybe we could even at some future point allow you to pass a NULL git_diff_list ** and only get callbacks without allocating any memory for the diff list. That may allow a future optimization for certain submodule functions that merely need to iterate until the first modified file is found and could then abort without having allocated any memory.

@yorah
Copy link
Contributor Author

yorah commented Jan 28, 2013

Ready for review!

One thing I'm not sure about, is that there is currently no notification for this part of the code (where we recurse in the next directory). Do you think there is a use case I'm not seeing for having one there?

Also, I didn't implement the "streaming" of the diff data in this PR, but if you think it would make sense, I can add it fairly easily (by returning a git_diff_delta_t in the callback, and passing a null git_diff_list to git_diff__from_iterators.

If you had more in mind for this notify thing, please tell me.

@arrbee
Copy link
Member

arrbee commented Jan 29, 2013

Hmm. The code you have here is fine, but I feel if we are going to add a notification callback for diffs as the deltas are being built, then we ought to pass the git_diff_delta * into the callback. As is, almost the only thing this could be used for is figuring out which pathspecs have been matched and which have not.

I appreciate keeping things simple and not building functionality that you have no use for, so I'm totally open to discussion about this, but since this is part of the public API that will (shortly) become harder to change quite so easily, I'd like to think about the function signature of the notification callback a little more.

In this case, I'm thinking something more like:

typedef int (*git_diff_notify_cb)(
    const git_diff_delta *delta,
    const char *matched_pathspec,
    void *payload);

Also, we should isolate the representation of the pathspec from the diff code, removing the knowledge from inside the diff_notify helper function. I think this can be done either by adding a const char *git_pathspec_get_pattern(git_vector *vspec, size_t index); helper function to look up the pattern or by modifying the git_pathspec_match_path() API to return exactly the information you need to make the notification callback.

I lean towards the second route: make the return value of git_pathspec_match_path() be a const char * of the pattern that matched (and NULL if no match) in place of the current bool return value. It seems more concise and returns exactly what you care about with minimal modification to existing APIs.

In that case, the code in diff_delta__from_one would look something like:

const char *match;

/* ... */

if ((match = git_pathspec_match_path(
        &diff->pathspec, entry->path,
        (diff->opts.flags & GIT_DIFF_DISABLE_PATHSPEC_MATCH) != 0,
        (diff->opts.flags & GIT_DIFF_DELTAS_ARE_ICASE))) == NULL)
    return 0;

/* ... */

if (diff->opts.notify_cb != NULL &&
    diff->opts.notify_cb(delta, match, diff->opts.notify_payload) != 0)
    return GIT_EUSER;

Correspondingly, diff_delta__from_two would only need a single extra const char * parameter of the match value. (On a related note, the opts parameter you have added to these two functions is not necessary since the git_diff_list already contains a pointer to the opts structure.)

What do you think?

@yorah
Copy link
Contributor Author

yorah commented Jan 30, 2013

Thanks for your feedback!

then we ought to pass the git_diff_delta * into the callback

Agreed.

typedef int (*git_diff_notify_cb)(

I see that you also made it so that the notify callback returns an int (instead of the void I used). Does it mean you would like the client to be able to abort the diff process (by returning an int != 0, as you did for the checkout notifier)?

I lean towards the second route: make the return value of git_pathspec_match_path() be a const char * of the pattern that matched (and NULL if no match) in place of the current bool return value.

Actually, this was one of the implementation I tried, but I ran into the same problem as when I returned the index instead of the bool: in the current code of git_pathspec_match_path, there is a short-circuit test at the beginning

if (!vspec || !vspec->length)
    return true;

Which means that the function returns true either when a match is found or (and this is the goal of the aforementionned check) when no pathspecs have been passed to it => I think it was added there to avoid code duplication in the callers.

I see three options to solve this problem:

  • move the check to the callers
  • use a "magic string value" to represent this case. For example we could say that an empty const char * is returned when no pathspec is passed to the function
  • keep the bool, and have the const char * as an out parameter (same logic as what I did with the out int *index)

Do you have a preference for one of those options, or for one I'm not seeing?

(On a related note, the opts parameter you have added to these two functions is not necessary since the git_diff_list already contains a pointer to the opts structure.)

Ouch, Thanks for spotting this! Now I feel silly :)

@arrbee
Copy link
Member

arrbee commented Jan 30, 2013

Oh, I completely overlooked the "empty pathspec matches everything" case! Sorry about that...

Hmm, given this, I prefer your existing implementation with the int index return value added to the match. The only change I would make is that the meaning of the value should be an index into the original git_strarray pathspec instead of an index into the git_vector. Obviously there is a one-to-one correspondence right now, but this will allow us to disconnect that in the future. When you pass the string to the callback function, let's pass something line (idx < 0) ? NULL : diff->opts.pathspec.strings[idx] (i.e. going back to the caller-provided data) instead of using the internal pattern from the vector.

Also, yes, I do prefer it for the notifier to return an int value and to abort the diff generation on a non-zero return value with a GIT_EUSER return code. This allows, for example, a caller to stop the diff as soon as any modifications are found or other cases where the entire diff might not be required.

Actually to take that one step further, I'd be tempted to make the callback signature into:

typedef int (*git_diff_notify_cb)(
    git_diff_list *diff_so_far,
    const git_diff_delta *delta_to_add,
    const char *matched_pathspec,
    void *payload);

and make the callback immediately prior to the git_vector_insert() calls. This would allow more context to be used by the notifier to make decisions about what to do.

For now, I think a zero return value would mean "continue" and a non-zero would be "stop and return GIT_EUSER". It would not be hard to go one step further and have < 0 mean "stop with GIT_EUSER", 0 mean "continue" and > 0 mean "skip this delta" so the notifier could even act like a delta filtering function.

Is that just added complexity that no one would ever use? Probably.

Let's add the diff_to_far parameter anyhow, just because it feels like the right thing to do, but only do the zero versus non-zero return codes. How does that sound?

@yorah
Copy link
Contributor Author

yorah commented Jan 30, 2013

How does that sound?

Awesome, will do that tomorrow!
Thanks a lot for your review.

@arrbee
Copy link
Member

arrbee commented Jan 30, 2013

Thanks @yorah for all your good work!

@yorah
Copy link
Contributor Author

yorah commented Jan 31, 2013

Not ready for review yet.

I'm still missing getting the index for the caller-provided data, from the git_pathspec_match_path function. I won't be able to finish it today, but I will get back to it as soon as possible (I don't know how yet, but I feel that I will need to do some mapping in the callers => I need to look at the impact of strcmp/strncmp comparisons).

On a separate note, I couldn't resist to do callback acting as a delta-filtering function thing, but it's easy to rollback this part if we decide not to keep it.

@yorah
Copy link
Contributor Author

yorah commented Feb 4, 2013

I think it's ready for review when you have time!

What I did:

  • changed the callback signature to your proposal (adding diff_so_far and delta_to_add)
  • used the return value of the callback to either abord the diff or skip the delta
  • changed the way the pathspec is sent to the callback (no need to map the index from the git_vector and the git_strarray anymore) by getting it directly as an out parameter from git_pathspec_match_path

However, it means that I'm passing back the "internal pattern from the vector", and not the "caller-provided" data. The only way I see I could do that would be to add to the git_attr_fnmatch struct, the index of the git_strarray entry it was built from. It felt really weird when I did it, as this information does not make sense for all use cases of git_attr_fnmatch (for instance in attr_file.git_attr_file__parse_buffer), so I ended up rollbacking it. I still have it somewhere in my reflog if you tell me it is the way to go.

@arrbee
Copy link
Member

arrbee commented Feb 4, 2013

Hey @yorah - to my cursory glance this is looking good! I'll go over it a little more closely tomorrow and get it merged. ⚡

git_strarray pathspec; /**< defaults to include all paths */
git_off_t max_size; /**< defaults to 512MB */
} git_diff_options;

#define GIT_DIFF_OPTIONS_VERSION 1
#define GIT_DIFF_OPTIONS_INIT {GIT_DIFF_OPTIONS_VERSION}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super-minor thing, but could you move the GIT_DIFF_OPTIONS_VERSION and GIT_DIFF_OPTIONS_INIT to right below the definition of the git_diff_options structure? I think it's better to keep them right next to each other (particularly so we can remember to bump the version when we make structure changes post-1.0 release).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, makes more sense.

@arrbee
Copy link
Member

arrbee commented Feb 5, 2013

So, made a couple of small comments; nothing too big. I'm liking how this turned out! 🌟

@yorah
Copy link
Contributor Author

yorah commented Feb 7, 2013

Thanks a lot for the review @arrbee, I just updated the PR and rebased on top of development.

I also:

  • rewrote the comment of git_pathspec_match_path in pathspec.h.
  • fixed a warning that I had introduced in workdir.c (thanks Travis!)

By the way, would you prefer me to squash the two commits together, or to keep them apart as it is now?

const git_diff_delta *delta,
const char *matched_pathspec)
{
if (!&diff->opts || !diff->opts.notify_cb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, just noticed a typo-ish here. There is no need to test !&diff->opts - it will always be false. Just test that the notify_cb is non-NULL.

@arrbee
Copy link
Member

arrbee commented Feb 7, 2013

🤘 It looks great. Do you want to fix that one little typo and then we'll merge? Thanks for all the followup!

yorah added 2 commits February 7, 2013 20:44
Instead of returning directly the pattern as the return value, I used an
out parameter, because the function also tests if the passed pathspecs
vector is empty. If yes, it considers that the path "matches", but in
that case there is no matched pattern per se.
The callback will be called for each file, just before the `git_delta_t` gets inserted into the diff list.

When the callback:
- returns < 0, the diff process will be aborted
- returns > 0, the delta will not be inserted into the diff list, but the diff process continues
- returns 0, the delta is inserted into the diff list, and the diff process continues
@yorah
Copy link
Contributor Author

yorah commented Feb 7, 2013

Typo fixed! 😀

@arrbee
Copy link
Member

arrbee commented Feb 8, 2013

✨ Awesome, thank you!

arrbee added a commit that referenced this pull request Feb 8, 2013
…pecs

diff: Add a callback to notify of diffed files
@arrbee arrbee merged commit f3e4921 into libgit2:development Feb 8, 2013
@yorah yorah deleted the topic/diff-notify-unmatched-pathspecs branch February 8, 2013 18:23
@yorah
Copy link
Contributor Author

yorah commented Feb 8, 2013

Thanks!

phatblat pushed a commit to phatblat/libgit2 that referenced this pull request Sep 13, 2014
…d-pathspecs

diff: Add a callback to notify of diffed files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants