Reading patch files #3223

ethomson · 2015-06-15T20:40:29Z

Moved #2280 over here and based it on master. With the refactored binary diff handling, I can proceed on binary application.

This PR is attempting to take an initial bite out of patch application, a la git-apply. My goal is to parse a git-style patch file created with libgit2 or git.git into a git_patch and be able to apply a git_patch to a buffer. This does not yet open up any public API for application, just internal methods; I was hoping to get some eyeballs on this and perhaps even merge it before I did something like applying patches into the working directory.

Binary patches
~~Traditional-format patches (should just need to parse their headers)~~
~~Improved whitespace tolerance during application~~
~~Context reduction during application~~
Handle abbreviated OIDs better

swisspol · 2015-06-17T16:06:03Z

Not trying to hijack this this discussion or anything, but before going further with supporting patches, since it's related, could you first fix the regressions in libgit2's stash apply implementation compared to Git CLT as discussed at the end of #3018? 😉

ethomson · 2015-09-24T12:52:13Z

So I think that we should review this now, and not worry about the three bullet points that I had previously listed, all of which have to do with the ability to apply patches that were created by old (non-git) diff. I think that handling those patches would be very nice indeed, but not critical and we should instead be focused on adopting some APIs that would allow us to apply patches that are created by ourselves or by git.git.

Broadly speaking, these changes lay the groundwork for apply patches that are created by (for example) our git_diff_* or the git-diff command. We parse patches for a single file, reading the changes as hunks, and put it into an internal structure that is common for both the patches that we parse and the patches that we create from diffs. The git_patch_* APIs remain unchanged after this refactoring (yay) and can now operate on both types of patches: those that we computed internally and those that were parsed from the outside world. This was nice as it allows us to take some corpus of patch files that were created by git-diff, parse them, then write them back out, to ensure that we are parsing correctly and writing patches correctly. (In many cases, we were not, so there were a number of bugs fixed around things like exotic filenames and dealing with changes that only changed a mode or a filename but no content.)

I'm really happy with the "guts" of this, though diff-based patches and the diff mechanism are still fairly intertwined in a way that I could not easily refactor in the time allotted. (And I allotted myself a lot of time, this has been cooking off and on for a year and a half. Mostly "off", but still. Sheesh.)

But the APIs need some work here. I don't think that the APIs are hard, here, just the terminology. Some things to point out:

Our git_patch object applies to a single file - eg the changes to make a preimage foo.c into a postimage foo.c. When diffing a collection of files, you would get a collection of patches, one for each modified file. This makes the naming of functions here a little tricky to talk about, as I think the general use case that we want to support is to take a patch file (which could contain multiple git_patches, in our parlance) and apply all the changes to the working directory.

I think that broadly speaking, we will need two objects - a git_patch that contains the changes from one file to another, and a collection of git_patches - let's call it a git_patchlist for the sake of argument. I think that the very compelling APIs will be:

int git_patchlist_from_file(git_patchlist **out, const char *filename);
int git_patchlist_from_buffer(git_patchlist **out, const char *buf);
int git_patchlist_apply_to_workdir(git_repository *repo, git_patchlist *patchlist);

There are interesting other APIs that might surface, like pulling patches out of a patchlist, and perhaps applying a git_patch to a buffer. Another options is to rename git_patch (kaboom) into something else that suggests its file scope, though I don't really know what because git_patchfile is even worse. :)

I actually think that the best course of action here is to drop the new git_patch_from_patchfile API into an internal and worry about the public API later. We can merge all the work here which will let us take advantage of the numerous bugfixes in existing APIs and we can worry about the new public APIs for parsing patches and apply patches in a separate PR.

Thoughts?

carlosmn · 2015-09-24T13:50:19Z

I think it makes sense to restrict ourselves to git-produced patches; we shouldn't try to be a general replacement for patch(1).

Regarding the naming for git_patch being a particular file, that naming is somewhat unfortunate, but meh. As for what to call a collection of these, when create it, we call it a git_diff. Is there a reason we can't use that for the patches we read in?

This is a generous number of commits, so it's bound to take some time to review. Thanks for sticking to it.

ethomson · 2015-09-24T14:30:44Z

As for what to call a collection of these, when create it, we call it a git_diff. Is there a reason we can't use that for the patches we read in?

That's true. I had initially not wanted to tackle that, because it would mean making git_diff abstract enough to deal with both types of data. Which is quite scary. But it worked out well enough for git_patch that this would perhaps be possible. Let me ponder this some more later.

ethomson · 2015-09-24T14:48:27Z

A couple of quick additions to squash some warnings and memory leaks. While doing that, I realized that the single git_patch handling was going to be used only as part of a large patchfile handling, and so only created pointers to memory within that larger buffer. Except that if we expose git_patch_from_patchfile, that will be caller memory. So I added a commit to dup the input.

For now, I'm just dup'ing the whole damned patch file, which is a minor amount of additional memory (just some paths and some prefixes) but reduces memory fragmentation or the need to pool or something else silly.

I don't imagine that it will stick around in its current form for very long, anyway.

carlosmn · 2015-11-17T18:43:43Z

src/apply.c

+
+	for (start = in; start < in + in_len; start = end) {
+		for (end = start; end < in + in_len && *end != '\n'; end++)
+			;


Do we maybe wanna use p_strnlen() here instead of open-coding it?

Sorry, not p_strnlen() but memchr().

Test with some postimages that actually grow/shrink from the original, adding new lines or removing them. (Also do so without context to ensure that we can add/remove from a non-zero part of the line vector.)

Parse diff files into a `git_diff` structure.

Like `git_patch_to_buf`, provide a simple helper method that can print an entire diff directory to a `git_buf`.

ethomson · 2016-05-26T18:02:59Z

Oops, ignore that last comment. I meant aa4bfb3 after this latest fixup.

(I accidentally reverted 153fde5 in that rename, and forgot to reapply it manually in my rebase. Ugh.)

carlosmn · 2016-06-23T07:46:50Z

tests/diff/parse.c

 {
 	git_repository *repo;
 	git_diff *computed, *parsed;
 	git_tree *a, *b;
 	git_diff_options opts = GIT_DIFF_OPTIONS_INIT;
+    git_diff_find_options findopts = GIT_DIFF_FIND_OPTIONS_INIT;


Odd indentation

carlosmn · 2016-06-23T07:49:42Z

I took a quick look and things seems sensible (other than that one piece of indentation 😈) so I think we should merge it and let is have wider usage.

hackhaslam · 2016-06-23T16:46:09Z

I'd like to see this merged. I've been testing the patch application part. My use case is partial application of patches for staging part of a file. It works well with a small change and the addition of a per hunk callback. I'll propose a pull request after this is merged.

Test that we can create a diff file, then parse the results and that the two are identical in-memory.

Patches may have no hunks when there's no modifications (for example, in a rename). Handle them.

When showing copy information because we are duplicating contents, for example, when performing a `diff --find-copies-harder -M100 -B100`, then show copy from/to lines in a patch, and do not show context. Ensure that we can also parse such patches.

ethomson force-pushed the apply branch 3 times, most recently from c0f0f65 to 0fa6725 Compare June 19, 2015 22:48

ethomson force-pushed the apply branch 2 times, most recently from 464cb59 to c1e2ab5 Compare July 10, 2015 14:37

ethomson force-pushed the apply branch 3 times, most recently from 1c4837a to aa72c47 Compare September 23, 2015 22:10

ethomson force-pushed the apply branch 2 times, most recently from c5dda46 to 214a27b Compare September 24, 2015 14:48

ethomson force-pushed the apply branch 3 times, most recently from 6bb0e48 to 891ac49 Compare September 25, 2015 18:05

ethomson force-pushed the apply branch 4 times, most recently from 7dee9fa to 829bb21 Compare October 28, 2015 17:58

carlosmn reviewed Nov 17, 2015
View reviewed changes

Edward Thomson and others added 15 commits May 26, 2016 13:01

apply: test postimages that grow/shrink original

0ff723c

Test with some postimages that actually grow/shrink from the original, adding new lines or removing them. (Also do so without context to ensure that we can add/remove from a non-zero part of the line vector.)

git_vector_grow/shrink: correct shrink, and tests

e564fc6

patch: formatting cleanups

a03952f

patch: provide static string advance_expected

00e63b3

patch: git_patch_from_patchfile -> git_patch_from_buffer

440e3ba

vector: more sensible names for grow_at/shrink_at

53571f2

patch: patch_diff -> patch_generated

8d44f8b

parse: introduce parse_ctx_contains_s

aa4bfb3

git_diff_generated: abstract generated diffs

9be638e

git_patch_parse_ctx: refcount the context

17572f6

patch: differentiate not found and invalid patches

94e488a

introduce git_diff_from_buffer to parse diffs

7166bb1

Parse diff files into a `git_diff` structure.

Introduce git_diff_to_buf

7282749

Like `git_patch_to_buf`, provide a simple helper method that can print an entire diff directory to a `git_buf`.

patch: identify non-binary patches as NOT_BINARY

33ae876

patch: zero id and abbrev length for empty files

853e585

ethomson force-pushed the apply branch from fb4d802 to 5eae4b6 Compare May 26, 2016 18:01

carlosmn reviewed Jun 23, 2016
View reviewed changes

Edward Thomson added 5 commits June 25, 2016 23:08

diff::parse tests: test parsing a diff

e774d5a

Test that we can create a diff file, then parse the results and that the two are identical in-memory.

patch::parse: handle patches with no hunks

38a347e

Patches may have no hunks when there's no modifications (for example, in a rename). Handle them.

patch::parse: test diff with simple rename

8a670dc

patch::parse: test diff with exact rename and copy

9eb1938

ethomson force-pushed the apply branch from 5eae4b6 to 1a79cd9 Compare June 26, 2016 03:09

ethomson merged commit 20302aa into libgit2:master Jun 26, 2016

johnhaley81 mentioned this pull request Feb 6, 2017

Bump libgit to 0bf0526 nodegit/nodegit#1187

Merged

wachulski mentioned this pull request Mar 5, 2018

Reading patch files missing libgit2/libgit2sharp#1540

Open

ethomson mentioned this pull request Jul 14, 2023

merge: Fix double free issue after merging the diffs #6589

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reading patch files #3223

Reading patch files #3223

Uh oh!

ethomson commented Jun 15, 2015

Uh oh!

swisspol commented Jun 17, 2015

Uh oh!

ethomson commented Sep 24, 2015

Uh oh!

carlosmn commented Sep 24, 2015

Uh oh!

ethomson commented Sep 24, 2015

Uh oh!

ethomson commented Sep 24, 2015

Uh oh!

carlosmn Nov 17, 2015

Uh oh!

carlosmn Nov 17, 2015

Uh oh!

ethomson commented May 26, 2016

Uh oh!

carlosmn Jun 23, 2016

Uh oh!

carlosmn commented Jun 23, 2016

Uh oh!

hackhaslam commented Jun 23, 2016 •

edited

Loading

Uh oh!

Uh oh!

Reading patch files #3223

Reading patch files #3223

Uh oh!

Conversation

ethomson commented Jun 15, 2015

Uh oh!

swisspol commented Jun 17, 2015

Uh oh!

ethomson commented Sep 24, 2015

Uh oh!

carlosmn commented Sep 24, 2015

Uh oh!

ethomson commented Sep 24, 2015

Uh oh!

ethomson commented Sep 24, 2015

Uh oh!

carlosmn Nov 17, 2015

Choose a reason for hiding this comment

Uh oh!

carlosmn Nov 17, 2015

Choose a reason for hiding this comment

Uh oh!

ethomson commented May 26, 2016

Uh oh!

carlosmn Jun 23, 2016

Choose a reason for hiding this comment

Uh oh!

carlosmn commented Jun 23, 2016

Uh oh!

hackhaslam commented Jun 23, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

hackhaslam commented Jun 23, 2016 •

edited

Loading