-
Notifications
You must be signed in to change notification settings - Fork 2.5k
diff: Add a callback to notify of diffed files #1249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
diff: Add a callback to notify of diffed files #1249
Conversation
@arrbee Current implementation relies on adding a |
Hmm, I am definitely not a fan of passing back data in the Reading through libgit2/libgit2sharp#274 I'm not completely clear on what you're trying to accomplish here. This is diff and so if you are just giving a list of filenames to diff, afterwards it should be pretty easy to see which ones are in the diff and which ones are not. If you are really using the pattern matching feature, I don't yet understand why you would need to know that ".c" matched and ".h" didn't match. I'm sorry if I'm being thick headed, but can you help me understand what this will be useful for? |
@arrbee this is supposed to be used with the LibGit2Sharp Stage()/Unstage()/... methods. Those ones accepts a list of pathspecs. Behind the scene, they perform a diff against the index. The issue is that when passing a list of pathspecs to them, some may match something, some may not, and the current diff implementation "swallows" unmatched pathspecs. I think it might be useful to inform the caller that some of the pathspecs haven't been matched, letting him/her decide if this should be considered as a fatal issue or not. For instance, this feature could also be leveraged by higher level functions such as checkout or the proposed enhancement of reset in #1190. |
Sorry, I don't see the use-case either. Can you elaborate on what you're trying to accomplish? Why would unmatched pathspecs be meaningful, or a reason for fatal failure? |
@nulltoken When you say that we eat pathspecs that don't match, are you thinking of exact matches (i.e. you're using the pathspec as a list of file names) or are you actually using more complicated match patterns? Also, what do you folks think about the notification callback changes that I made to the checkout code? I only bring that up, because I'm very open to extending diff with support for a similar type of notification / observation callback that notifies about files & trees being examined, and could potentially include the specific pathspec entry that did match (i.e. the positive of pathspec usage, as opposed to the negative). |
@vmg The process performs a diff between the index and the workdir, and for each delta, depending on the status, adds/replaces an entry in the index or removes it. Currently, when being passed exact or match patterns that doesn't fit anything in either the workdir or the index, the diffing process just ignores the pathspecs. From what I've seen:
Provided the diffing mechanism was able to provide information about unmatched pathspecs, the idea would be to combine those two options and let the consumer defines the expected behavior through:
This would allow the caller to define how "safe", "lax" or "blind" he'd want the method to operate. The callback payload could be used for logging purpose, or notifying the user or even build a exception with a list of wrong pathspecs. For what I foresee, In my mind, this would allow the consumer, depending of the kind of usage he makes of LibGit2Sharp to rely on a unified and as flexible as possible API when performing workdir/index interactions. In most of the LibGit2Sharp tests the pathspecs are exact matches, but this also could be a mix of exact and match patterns being passed in the same batch. I hope I've been able to describe as clearly as possible what I'm after and why this kind of negative callback mechanism might be interesting. If that's not the case, please just let me know and I'll do my best to better explain myself . |
Works for me, we would have enough information in LibGit2Sharp to determine which pathspecs were not matched.
It looks prettier than what I did 😄 Something strange just crossed my mind: if we push the notification thing a little bit, won't we end up building the |
Duh. I overlooked this 😊 |
@yorah If you're willing to take this on, I'd be happy to see this PR evolve into a notifier-based approach! Regarding the new signature for The key will probably be getting the correct signature for the notifier callback function. I'm not sure what it should be, but I look forward to your take on it. It is interesting that you could potentially use the notifications to get the diff data incrementally instead of all at the end. Of course, the prevents later steps such as rename detection, but if those aren't relevant to you, then maybe we could even at some future point allow you to pass a NULL |
Ready for review! One thing I'm not sure about, is that there is currently no notification for this part of the code (where we recurse in the next directory). Do you think there is a use case I'm not seeing for having one there? Also, I didn't implement the "streaming" of the diff data in this PR, but if you think it would make sense, I can add it fairly easily (by returning a If you had more in mind for this notify thing, please tell me. |
Hmm. The code you have here is fine, but I feel if we are going to add a notification callback for diffs as the deltas are being built, then we ought to pass the I appreciate keeping things simple and not building functionality that you have no use for, so I'm totally open to discussion about this, but since this is part of the public API that will (shortly) become harder to change quite so easily, I'd like to think about the function signature of the notification callback a little more. In this case, I'm thinking something more like: typedef int (*git_diff_notify_cb)(
const git_diff_delta *delta,
const char *matched_pathspec,
void *payload); Also, we should isolate the representation of the pathspec from the diff code, removing the knowledge from inside the I lean towards the second route: make the return value of In that case, the code in const char *match;
/* ... */
if ((match = git_pathspec_match_path(
&diff->pathspec, entry->path,
(diff->opts.flags & GIT_DIFF_DISABLE_PATHSPEC_MATCH) != 0,
(diff->opts.flags & GIT_DIFF_DELTAS_ARE_ICASE))) == NULL)
return 0;
/* ... */
if (diff->opts.notify_cb != NULL &&
diff->opts.notify_cb(delta, match, diff->opts.notify_payload) != 0)
return GIT_EUSER; Correspondingly, What do you think? |
Thanks for your feedback!
Agreed.
I see that you also made it so that the notify callback returns an int (instead of the void I used). Does it mean you would like the client to be able to abort the diff process (by returning an int != 0, as you did for the checkout notifier)?
Actually, this was one of the implementation I tried, but I ran into the same problem as when I returned the index instead of the bool: in the current code of
Which means that the function returns true either when a match is found or (and this is the goal of the aforementionned check) when no pathspecs have been passed to it => I think it was added there to avoid code duplication in the callers. I see three options to solve this problem:
Do you have a preference for one of those options, or for one I'm not seeing?
Ouch, Thanks for spotting this! Now I feel silly :) |
Oh, I completely overlooked the "empty pathspec matches everything" case! Sorry about that... Hmm, given this, I prefer your existing implementation with the Also, yes, I do prefer it for the notifier to return an int value and to abort the diff generation on a non-zero return value with a Actually to take that one step further, I'd be tempted to make the callback signature into: typedef int (*git_diff_notify_cb)(
git_diff_list *diff_so_far,
const git_diff_delta *delta_to_add,
const char *matched_pathspec,
void *payload); and make the callback immediately prior to the For now, I think a zero return value would mean "continue" and a non-zero would be "stop and return GIT_EUSER". It would not be hard to go one step further and have < 0 mean "stop with GIT_EUSER", 0 mean "continue" and > 0 mean "skip this delta" so the notifier could even act like a delta filtering function. Is that just added complexity that no one would ever use? Probably. Let's add the |
Awesome, will do that tomorrow! |
Thanks @yorah for all your good work! |
Not ready for review yet. I'm still missing getting the index for the caller-provided data, from the On a separate note, I couldn't resist to do callback acting as a delta-filtering function thing, but it's easy to rollback this part if we decide not to keep it. |
I think it's ready for review when you have time! What I did:
However, it means that I'm passing back the "internal pattern from the vector", and not the "caller-provided" data. The only way I see I could do that would be to add to the |
Hey @yorah - to my cursory glance this is looking good! I'll go over it a little more closely tomorrow and get it merged. ⚡ |
git_strarray pathspec; /**< defaults to include all paths */ | ||
git_off_t max_size; /**< defaults to 512MB */ | ||
} git_diff_options; | ||
|
||
#define GIT_DIFF_OPTIONS_VERSION 1 | ||
#define GIT_DIFF_OPTIONS_INIT {GIT_DIFF_OPTIONS_VERSION} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super-minor thing, but could you move the GIT_DIFF_OPTIONS_VERSION
and GIT_DIFF_OPTIONS_INIT
to right below the definition of the git_diff_options
structure? I think it's better to keep them right next to each other (particularly so we can remember to bump the version when we make structure changes post-1.0 release).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, makes more sense.
So, made a couple of small comments; nothing too big. I'm liking how this turned out! 🌟 |
Thanks a lot for the review @arrbee, I just updated the PR and rebased on top of development. I also:
By the way, would you prefer me to squash the two commits together, or to keep them apart as it is now? |
const git_diff_delta *delta, | ||
const char *matched_pathspec) | ||
{ | ||
if (!&diff->opts || !diff->opts.notify_cb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, just noticed a typo-ish here. There is no need to test !&diff->opts
- it will always be false. Just test that the notify_cb
is non-NULL.
🤘 It looks great. Do you want to fix that one little typo and then we'll merge? Thanks for all the followup! |
Instead of returning directly the pattern as the return value, I used an out parameter, because the function also tests if the passed pathspecs vector is empty. If yes, it considers that the path "matches", but in that case there is no matched pattern per se.
The callback will be called for each file, just before the `git_delta_t` gets inserted into the diff list. When the callback: - returns < 0, the diff process will be aborted - returns > 0, the delta will not be inserted into the diff list, but the diff process continues - returns 0, the delta is inserted into the diff list, and the diff process continues
Typo fixed! 😀 |
✨ Awesome, thank you! |
…pecs diff: Add a callback to notify of diffed files
Thanks! |
…d-pathspecs diff: Add a callback to notify of diffed files
Should help implementing the optional "Ignore unmatched pathspec" behavior in libgit2/libgit2sharp#274