Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Reading patch files #3223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 54 commits into from
Jun 26, 2016
Merged

Reading patch files #3223

merged 54 commits into from
Jun 26, 2016

Conversation

ethomson
Copy link
Member

Moved #2280 over here and based it on master. With the refactored binary diff handling, I can proceed on binary application.

This PR is attempting to take an initial bite out of patch application, a la git-apply. My goal is to parse a git-style patch file created with libgit2 or git.git into a git_patch and be able to apply a git_patch to a buffer. This does not yet open up any public API for application, just internal methods; I was hoping to get some eyeballs on this and perhaps even merge it before I did something like applying patches into the working directory.

  • Binary patches
  • Traditional-format patches (should just need to parse their headers)
  • Improved whitespace tolerance during application
  • Context reduction during application
  • Handle abbreviated OIDs better

@swisspol
Copy link
Contributor

Not trying to hijack this this discussion or anything, but before going further with supporting patches, since it's related, could you first fix the regressions in libgit2's stash apply implementation compared to Git CLT as discussed at the end of #3018? 😉

@ethomson ethomson force-pushed the apply branch 3 times, most recently from c0f0f65 to 0fa6725 Compare June 19, 2015 22:48
@ethomson ethomson force-pushed the apply branch 2 times, most recently from 464cb59 to c1e2ab5 Compare July 10, 2015 14:37
@ethomson ethomson force-pushed the apply branch 3 times, most recently from 1c4837a to aa72c47 Compare September 23, 2015 22:10
@ethomson
Copy link
Member Author

So I think that we should review this now, and not worry about the three bullet points that I had previously listed, all of which have to do with the ability to apply patches that were created by old (non-git) diff. I think that handling those patches would be very nice indeed, but not critical and we should instead be focused on adopting some APIs that would allow us to apply patches that are created by ourselves or by git.git.

Broadly speaking, these changes lay the groundwork for apply patches that are created by (for example) our git_diff_* or the git-diff command. We parse patches for a single file, reading the changes as hunks, and put it into an internal structure that is common for both the patches that we parse and the patches that we create from diffs. The git_patch_* APIs remain unchanged after this refactoring (yay) and can now operate on both types of patches: those that we computed internally and those that were parsed from the outside world. This was nice as it allows us to take some corpus of patch files that were created by git-diff, parse them, then write them back out, to ensure that we are parsing correctly and writing patches correctly. (In many cases, we were not, so there were a number of bugs fixed around things like exotic filenames and dealing with changes that only changed a mode or a filename but no content.)

I'm really happy with the "guts" of this, though diff-based patches and the diff mechanism are still fairly intertwined in a way that I could not easily refactor in the time allotted. (And I allotted myself a lot of time, this has been cooking off and on for a year and a half. Mostly "off", but still. Sheesh.)

But the APIs need some work here. I don't think that the APIs are hard, here, just the terminology. Some things to point out:

Our git_patch object applies to a single file - eg the changes to make a preimage foo.c into a postimage foo.c. When diffing a collection of files, you would get a collection of patches, one for each modified file. This makes the naming of functions here a little tricky to talk about, as I think the general use case that we want to support is to take a patch file (which could contain multiple git_patches, in our parlance) and apply all the changes to the working directory.

I think that broadly speaking, we will need two objects - a git_patch that contains the changes from one file to another, and a collection of git_patches - let's call it a git_patchlist for the sake of argument. I think that the very compelling APIs will be:

int git_patchlist_from_file(git_patchlist **out, const char *filename);
int git_patchlist_from_buffer(git_patchlist **out, const char *buf);
int git_patchlist_apply_to_workdir(git_repository *repo, git_patchlist *patchlist);

There are interesting other APIs that might surface, like pulling patches out of a patchlist, and perhaps applying a git_patch to a buffer. Another options is to rename git_patch (kaboom) into something else that suggests its file scope, though I don't really know what because git_patchfile is even worse. :)

I actually think that the best course of action here is to drop the new git_patch_from_patchfile API into an internal and worry about the public API later. We can merge all the work here which will let us take advantage of the numerous bugfixes in existing APIs and we can worry about the new public APIs for parsing patches and apply patches in a separate PR.

Thoughts?

@carlosmn
Copy link
Member

I think it makes sense to restrict ourselves to git-produced patches; we shouldn't try to be a general replacement for patch(1).

Regarding the naming for git_patch being a particular file, that naming is somewhat unfortunate, but meh. As for what to call a collection of these, when create it, we call it a git_diff. Is there a reason we can't use that for the patches we read in?

This is a generous number of commits, so it's bound to take some time to review. Thanks for sticking to it.

@ethomson
Copy link
Member Author

As for what to call a collection of these, when create it, we call it a git_diff. Is there a reason we can't use that for the patches we read in?

That's true. I had initially not wanted to tackle that, because it would mean making git_diff abstract enough to deal with both types of data. Which is quite scary. But it worked out well enough for git_patch that this would perhaps be possible. Let me ponder this some more later.

@ethomson ethomson force-pushed the apply branch 2 times, most recently from c5dda46 to 214a27b Compare September 24, 2015 14:48
@ethomson
Copy link
Member Author

A couple of quick additions to squash some warnings and memory leaks. While doing that, I realized that the single git_patch handling was going to be used only as part of a large patchfile handling, and so only created pointers to memory within that larger buffer. Except that if we expose git_patch_from_patchfile, that will be caller memory. So I added a commit to dup the input.

For now, I'm just dup'ing the whole damned patch file, which is a minor amount of additional memory (just some paths and some prefixes) but reduces memory fragmentation or the need to pool or something else silly.

I don't imagine that it will stick around in its current form for very long, anyway.

@ethomson ethomson force-pushed the apply branch 3 times, most recently from 6bb0e48 to 891ac49 Compare September 25, 2015 18:05
@ethomson ethomson force-pushed the apply branch 4 times, most recently from 7dee9fa to 829bb21 Compare October 28, 2015 17:58

for (start = in; start < in + in_len; start = end) {
for (end = start; end < in + in_len && *end != '\n'; end++)
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we maybe wanna use p_strnlen() here instead of open-coding it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not p_strnlen() but memchr().

@ethomson
Copy link
Member Author

Oops, ignore that last comment. I meant aa4bfb3 after this latest fixup.

(I accidentally reverted 153fde5 in that rename, and forgot to reapply it manually in my rebase. Ugh.)

{
git_repository *repo;
git_diff *computed, *parsed;
git_tree *a, *b;
git_diff_options opts = GIT_DIFF_OPTIONS_INIT;
git_diff_find_options findopts = GIT_DIFF_FIND_OPTIONS_INIT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd indentation

@carlosmn
Copy link
Member

I took a quick look and things seems sensible (other than that one piece of indentation 😈) so I think we should merge it and let is have wider usage.

@hackhaslam
Copy link
Contributor

hackhaslam commented Jun 23, 2016

I'd like to see this merged. I've been testing the patch application part. My use case is partial application of patches for staging part of a file. It works well with a small change and the addition of a per hunk callback. I'll propose a pull request after this is merged.

Edward Thomson added 5 commits June 25, 2016 23:08
Test that we can create a diff file, then parse the results and
that the two are identical in-memory.
Patches may have no hunks when there's no modifications (for example,
in a rename).  Handle them.
When showing copy information because we are duplicating contents,
for example, when performing a `diff --find-copies-harder -M100 -B100`,
then show copy from/to lines in a patch, and do not show context.
Ensure that we can also parse such patches.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants