Thanks to visit codestin.com
Credit goes to github.com

Skip to content

patch_generate: only calculate binary diffs if requested #3922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

pks-t
Copy link
Member

@pks-t pks-t commented Sep 1, 2016

When generating diffs for binary files, we load and decompress
the blobs in order to generate the actual diff, which can be very
costly. While we cannot avoid this for the case when we are
called with the GIT_DIFF_SHOW_BINARY flag, we do not have to
load the blobs in the case where this flag is not set, as the
caller is expected to have no interest in the actual content of
binary files.

Fix the issue by only generating a binary diff when the caller is
actually interested in the diff. As libgit2 uses heuristics to
determine that a blob contains binary data by inspecting its size
without loading from the ODB, this saves us quite some time when
diffing in a repository with binary files.

When generating diffs for binary files, we load and decompress
the blobs in order to generate the actual diff, which can be very
costly. While we cannot avoid this for the case when we are
called with the `GIT_DIFF_SHOW_BINARY` flag, we do not have to
load the blobs in the case where this flag is not set, as the
caller is expected to have no interest in the actual content of
binary files.

Fix the issue by only generating a binary diff when the caller is
actually interested in the diff. As libgit2 uses heuristics to
determine that a blob contains binary data by inspecting its size
without loading from the ODB, this saves us quite some time when
diffing in a repository with binary files.
@pks-t
Copy link
Member Author

pks-t commented Sep 1, 2016

Refs #3920

One thing I'd like to discuss is whether we should pass NULL to the binary_cb or, as I do in the patch, a git_diff_binary struct initialized to all-zero. I only took the second approach as passing NULL breaks some tests, so it was more of a lazy shortcut. I'm still open to changing this, though.

@carlosmn
Copy link
Member

carlosmn commented Sep 1, 2016

We should probably be passing in the minimal information there. The comment for the type doesn't say what fields you can expect to have filled, so it looks like we should define what gets filled and then fill that. Presumably we'd want to pass in the path, object ids and mode so you can display the message that it changed if that's all you're after.

@ethomson
Copy link
Member

ethomson commented Sep 1, 2016

I think I would prefer to pass a NULL here if we didn't load it. The path and object IDs and such are in the delta, and is suitable for a caller that just wanted to emulate Binary files a/foo.txt and b/foo.txt differ.

The git_diff_binary contains the old side and the new side of the binary data which is only useful for displaying the actual binary contents and emulating git diff --binary. If we don't load that data then I think NULL would be very appropriate.

This is a good fix, I'm going to merge it and add NULL handling on top.

Thanks @pks-t !

@ethomson
Copy link
Member

ethomson commented Sep 1, 2016

After poking around with this a bit more, I realized that NULL probably isn't appropriate but that instead a git_diff_delta should indicate whether there's any data in its files... (It can't look at the file sides since it may have been generated by parsing a patch that said Binary files differ or by running git_diff_blob_to_blob on two zero byte files, which are harder to discern.)

While looking at this I realized that we didn't have any way to parse a patch that had Binary files differ situation, so I'm fixing that up.

@ethomson
Copy link
Member

ethomson commented Sep 2, 2016

The aforementioned Binary files differ situation is a little more obnoxious than I had initially thought; there are some tests that assume behavior that I'm not sure makes sense. I'm going to go ahead and merge this to unblock #3920 since this is correct.

@ethomson ethomson merged commit ce54e77 into libgit2:master Sep 2, 2016
@pks-t pks-t deleted the pks/diff-only-load-binaries-when-requested branch October 10, 2016 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants