Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 93da7af

Browse files
author
Vicent Martí
committed
Merge pull request libgit2#1642 from arrbee/diff-function-context
Diff code reorg plus function context in diff headers
2 parents 5438e9c + 360f42f commit 93da7af

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+3280
-2033
lines changed

docs/diff-internals.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
Diff is broken into four phases:
2+
3+
1. Building a list of things that have changed. These changes are called
4+
deltas (git_diff_delta objects) and are grouped into a git_diff_list.
5+
2. Applying file similarity measurement for rename and copy detection (and
6+
to potentially split files that have changed radically). This step is
7+
optional.
8+
3. Computing the textual diff for each delta. Not all deltas have a
9+
meaningful textual diff. For those that do, the textual diff can
10+
either be generated on the fly and passed to output callbacks or can be
11+
turned into a git_diff_patch object.
12+
4. Formatting the diff and/or patch into standard text formats (such as
13+
patches, raw lists, etc).
14+
15+
In the source code, step 1 is implemented in `src/diff.c`, step 2 in
16+
`src/diff_tform.c`, step 3 in `src/diff_patch.c`, and step 4 in
17+
`src/diff_print.c`. Additionally, when it comes to accessing file
18+
content, everything goes through diff drivers that are implemented in
19+
`src/diff_driver.c`.
20+
21+
External Objects
22+
----------------
23+
24+
* `git_diff_options` repesents user choices about how a diff should be
25+
performed and is passed to most diff generating functions.
26+
* `git_diff_file` represents an item on one side of a possible delta
27+
* `git_diff_delta` represents a pair of items that have changed in some
28+
way - it contains two `git_diff_file` plus a status and other stuff.
29+
* `git_diff_list` is a list of deltas along with information about how
30+
those particular deltas were found.
31+
* `git_diff_patch` represents the actual diff between a pair of items. In
32+
some cases, a delta may not have a corresponding patch, if the objects
33+
are binary, for example. The content of a patch will be a set of hunks
34+
and lines.
35+
* A `hunk` is range of lines described by a `git_diff_range` (i.e. "lines
36+
10-20 in the old file became lines 12-23 in the new"). It will have a
37+
header that compactly represents that information, and it will have a
38+
number of lines of context surrounding added and deleted lines.
39+
* A `line` is simple a line of data along with a `git_diff_line_t` value
40+
that tells how the data should be interpretted (e.g. context or added).
41+
42+
Internal Objects
43+
----------------
44+
45+
* `git_diff_file_content` is an internal structure that represents the
46+
data on one side of an item to be diffed; it is an augmented
47+
`git_diff_file` with more flags and the actual file data.
48+
** it is created from a repository plus a) a git_diff_file, b) a git_blob,
49+
or c) raw data and size
50+
** there are three main operations on git_diff_file_content:
51+
*** _initialization_ sets up the data structure and does what it can up to,
52+
but not including loading and looking at the actual data
53+
*** _loading_ loads the data, preprocesses it (i.e. applies filters) and
54+
potentially analyzes it (to decide if binary)
55+
*** _free_ releases loaded data and frees any allocated memory
56+
57+
* The internal structure of a `git_diff_patch` stores the actual diff
58+
between a pair of `git_diff_file_content` items
59+
** it may be "unset" if the items are not diffable
60+
** "empty" if the items are the same
61+
** otherwise it will consist of a set of hunks each of which covers some
62+
number of lines of context, additions and deletions
63+
** a patch is created from two git_diff_file_content items
64+
** a patch is fully instantiated in three phases:
65+
*** initial creation and initialization
66+
*** loading of data and preliminary data examination
67+
*** diffing of data and optional storage of diffs
68+
** (TBD) if a patch is asked to store the diffs and the size of the diff
69+
is significantly smaller than the raw data of the two sides, then the
70+
patch may be flattened using a pool of string data
71+
72+
* `git_diff_output` is an internal structure that represents an output
73+
target for a `git_diff_patch`
74+
** It consists of file, hunk, and line callbacks, plus a payload
75+
** There is a standard flattened output that can be used for plain text output
76+
** Typically we use a `git_xdiff_output` which drives the callbacks via the
77+
xdiff code taken from core Git.
78+
79+
* `git_diff_driver` is an internal structure that encapsulates the logic
80+
for a given type of file
81+
** a driver is looked up based on the name and mode of a file.
82+
** the driver can then be used to:
83+
*** determine if a file is binary (by attributes, by git_diff_options
84+
settings, or by examining the content)
85+
*** give you a function pointer that is used to evaluate function context
86+
for hunk headers
87+
** At some point, the logic for getting a filtered version of file content
88+
or calculating the OID of a file may be moved into the driver.

include/git2/diff.h

Lines changed: 48 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,9 @@ typedef enum {
148148
* Of course, ignore rules are still checked for the directory itself.
149149
*/
150150
GIT_DIFF_FAST_UNTRACKED_DIRS = (1 << 19),
151+
152+
/** Treat all files as binary, disabling text diffs */
153+
GIT_DIFF_FORCE_BINARY = (1 << 20),
151154
} git_diff_option_t;
152155

153156
/**
@@ -857,7 +860,7 @@ GIT_EXTERN(size_t) git_diff_patch_num_hunks(
857860
* @param total_additions Count of addition lines in output, can be NULL.
858861
* @param total_deletions Count of deletion lines in output, can be NULL.
859862
* @param patch The git_diff_patch object
860-
* @return Number of lines in hunk or -1 if invalid hunk index
863+
* @return 0 on success, <0 on error
861864
*/
862865
GIT_EXTERN(int) git_diff_patch_line_stats(
863866
size_t *total_context,
@@ -997,6 +1000,26 @@ GIT_EXTERN(int) git_diff_blobs(
9971000
git_diff_data_cb line_cb,
9981001
void *payload);
9991002

1003+
/**
1004+
* Directly generate a patch from the difference between two blobs.
1005+
*
1006+
* This is just like `git_diff_blobs()` except it generates a patch object
1007+
* for the difference instead of directly making callbacks. You can use the
1008+
* standard `git_diff_patch` accessor functions to read the patch data, and
1009+
* you must call `git_diff_patch_free()` on the patch when done.
1010+
*
1011+
* @param out The generated patch; NULL on error
1012+
* @param old_blob Blob for old side of diff, or NULL for empty blob
1013+
* @param new_blob Blob for new side of diff, or NULL for empty blob
1014+
* @param options Options for diff, or NULL for default options
1015+
* @return 0 on success or error code < 0
1016+
*/
1017+
GIT_EXTERN(int) git_diff_patch_from_blobs(
1018+
git_diff_patch **out,
1019+
const git_blob *old_blob,
1020+
const git_blob *new_blob,
1021+
const git_diff_options *opts);
1022+
10001023
/**
10011024
* Directly run a diff between a blob and a buffer.
10021025
*
@@ -1010,7 +1033,7 @@ GIT_EXTERN(int) git_diff_blobs(
10101033
* the reverse, with GIT_DELTA_REMOVED and blob content removed.
10111034
*
10121035
* @param old_blob Blob for old side of diff, or NULL for empty blob
1013-
* @param buffer Raw data for new side of diff
1036+
* @param buffer Raw data for new side of diff, or NULL for empty
10141037
* @param buffer_len Length of raw data for new side of diff
10151038
* @param options Options for diff, or NULL for default options
10161039
* @param file_cb Callback for "file"; made once if there is a diff; can be NULL
@@ -1029,6 +1052,29 @@ GIT_EXTERN(int) git_diff_blob_to_buffer(
10291052
git_diff_data_cb data_cb,
10301053
void *payload);
10311054

1055+
/**
1056+
* Directly generate a patch from the difference between a blob and a buffer.
1057+
*
1058+
* This is just like `git_diff_blob_to_buffer()` except it generates a patch
1059+
* object for the difference instead of directly making callbacks. You can
1060+
* use the standard `git_diff_patch` accessor functions to read the patch
1061+
* data, and you must call `git_diff_patch_free()` on the patch when done.
1062+
*
1063+
* @param out The generated patch; NULL on error
1064+
* @param old_blob Blob for old side of diff, or NULL for empty blob
1065+
* @param buffer Raw data for new side of diff, or NULL for empty
1066+
* @param buffer_len Length of raw data for new side of diff
1067+
* @param options Options for diff, or NULL for default options
1068+
* @return 0 on success or error code < 0
1069+
*/
1070+
GIT_EXTERN(int) git_diff_patch_from_blob_and_buffer(
1071+
git_diff_patch **out,
1072+
const git_blob *old_blob,
1073+
const char *buf,
1074+
size_t buflen,
1075+
const git_diff_options *opts);
1076+
1077+
10321078
GIT_END_DECL
10331079

10341080
/** @} */

src/array.h

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
/*
2+
* Copyright (C) the libgit2 contributors. All rights reserved.
3+
*
4+
* This file is part of libgit2, distributed under the GNU GPL v2 with
5+
* a Linking Exception. For full terms see the included COPYING file.
6+
*/
7+
#ifndef INCLUDE_array_h__
8+
#define INCLUDE_array_h__
9+
10+
#include "util.h"
11+
12+
/*
13+
* Use this to declare a typesafe resizable array of items, a la:
14+
*
15+
* git_array_t(int) my_ints = GIT_ARRAY_INIT;
16+
* ...
17+
* int *i = git_array_alloc(my_ints);
18+
* GITERR_CHECK_ALLOC(i);
19+
* ...
20+
* git_array_clear(my_ints);
21+
*
22+
* You may also want to do things like:
23+
*
24+
* typedef git_array_t(my_struct) my_struct_array_t;
25+
*/
26+
#define git_array_t(type) struct { type *ptr; uint32_t size, asize; }
27+
28+
#define GIT_ARRAY_INIT { NULL, 0, 0 }
29+
30+
#define git_array_init(a) \
31+
do { (a).size = (a).asize = 0; (a).ptr = NULL; } while (0)
32+
33+
#define git_array_clear(a) \
34+
do { git__free((a).ptr); git_array_init(a); } while (0)
35+
36+
#define GITERR_CHECK_ARRAY(a) GITERR_CHECK_ALLOC((a).ptr)
37+
38+
39+
typedef git_array_t(void) git_array_generic_t;
40+
41+
/* use a generic array for growth so this can return the new item */
42+
GIT_INLINE(void *) git_array_grow(git_array_generic_t *a, size_t item_size)
43+
{
44+
uint32_t new_size = (a->size < 8) ? 8 : a->asize * 3 / 2;
45+
void *new_array = git__realloc(a->ptr, new_size * item_size);
46+
if (!new_array) {
47+
git_array_clear(*a);
48+
return NULL;
49+
} else {
50+
a->ptr = new_array; a->asize = new_size; a->size++;
51+
return (((char *)a->ptr) + (a->size - 1) * item_size);
52+
}
53+
}
54+
55+
#define git_array_alloc(a) \
56+
((a).size >= (a).asize) ? \
57+
git_array_grow((git_array_generic_t *)&(a), sizeof(*(a).ptr)) : \
58+
(a).ptr ? &(a).ptr[(a).size++] : NULL
59+
60+
#define git_array_last(a) ((a).size ? &(a).ptr[(a).size - 1] : NULL)
61+
62+
#define git_array_get(a, i) (((i) < (a).size) ? &(a).ptr[(i)] : NULL)
63+
64+
#define git_array_size(a) (a).size
65+
66+
#endif

src/blob.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
#include "git2/odb_backend.h"
1212

1313
#include "common.h"
14+
#include "filebuf.h"
1415
#include "blob.h"
1516
#include "filter.h"
1617
#include "buf_text.h"

src/checkout.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020

2121
#include "refs.h"
2222
#include "repository.h"
23+
#include "index.h"
2324
#include "filter.h"
2425
#include "blob.h"
2526
#include "diff.h"

src/clone.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
#include "fileops.h"
2222
#include "refs.h"
2323
#include "path.h"
24+
#include "repository.h"
2425

2526
static int create_branch(
2627
git_reference **branch,

src/crlf.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,16 @@
55
* a Linking Exception. For full terms see the included COPYING file.
66
*/
77

8+
#include "git2/attr.h"
9+
#include "git2/blob.h"
10+
#include "git2/index.h"
11+
812
#include "common.h"
913
#include "fileops.h"
1014
#include "hash.h"
1115
#include "filter.h"
1216
#include "buf_text.h"
1317
#include "repository.h"
14-
#include "git2/attr.h"
15-
#include "git2/blob.h"
1618

1719
struct crlf_attrs {
1820
int crlf_action;

src/diff.c

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
#include "attr_file.h"
1212
#include "filter.h"
1313
#include "pathspec.h"
14+
#include "index.h"
15+
#include "odb.h"
1416

1517
#define DIFF_FLAG_IS_SET(DIFF,FLAG) (((DIFF)->opts.flags & (FLAG)) != 0)
1618
#define DIFF_FLAG_ISNT_SET(DIFF,FLAG) (((DIFF)->opts.flags & (FLAG)) == 0)
@@ -1170,3 +1172,73 @@ int git_diff_tree_to_workdir(
11701172

11711173
return error;
11721174
}
1175+
1176+
size_t git_diff_num_deltas(git_diff_list *diff)
1177+
{
1178+
assert(diff);
1179+
return (size_t)diff->deltas.length;
1180+
}
1181+
1182+
size_t git_diff_num_deltas_of_type(git_diff_list *diff, git_delta_t type)
1183+
{
1184+
size_t i, count = 0;
1185+
git_diff_delta *delta;
1186+
1187+
assert(diff);
1188+
1189+
git_vector_foreach(&diff->deltas, i, delta) {
1190+
count += (delta->status == type);
1191+
}
1192+
1193+
return count;
1194+
}
1195+
1196+
int git_diff__paired_foreach(
1197+
git_diff_list *idx2head,
1198+
git_diff_list *wd2idx,
1199+
int (*cb)(git_diff_delta *i2h, git_diff_delta *w2i, void *payload),
1200+
void *payload)
1201+
{
1202+
int cmp;
1203+
git_diff_delta *i2h, *w2i;
1204+
size_t i, j, i_max, j_max;
1205+
int (*strcomp)(const char *, const char *);
1206+
1207+
i_max = idx2head ? idx2head->deltas.length : 0;
1208+
j_max = wd2idx ? wd2idx->deltas.length : 0;
1209+
1210+
/* Get appropriate strcmp function */
1211+
strcomp = idx2head ? idx2head->strcomp : wd2idx ? wd2idx->strcomp : NULL;
1212+
1213+
/* Assert both iterators use matching ignore-case. If this function ever
1214+
* supports merging diffs that are not sorted by the same function, then
1215+
* it will need to spool and sort on one of the results before merging
1216+
*/
1217+
if (idx2head && wd2idx) {
1218+
assert(idx2head->strcomp == wd2idx->strcomp);
1219+
}
1220+
1221+
for (i = 0, j = 0; i < i_max || j < j_max; ) {
1222+
i2h = idx2head ? GIT_VECTOR_GET(&idx2head->deltas,i) : NULL;
1223+
w2i = wd2idx ? GIT_VECTOR_GET(&wd2idx->deltas,j) : NULL;
1224+
1225+
cmp = !w2i ? -1 : !i2h ? 1 :
1226+
strcomp(i2h->old_file.path, w2i->old_file.path);
1227+
1228+
if (cmp < 0) {
1229+
if (cb(i2h, NULL, payload))
1230+
return GIT_EUSER;
1231+
i++;
1232+
} else if (cmp > 0) {
1233+
if (cb(NULL, w2i, payload))
1234+
return GIT_EUSER;
1235+
j++;
1236+
} else {
1237+
if (cb(i2h, w2i, payload))
1238+
return GIT_EUSER;
1239+
i++; j++;
1240+
}
1241+
}
1242+
1243+
return 0;
1244+
}

src/diff.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,11 +29,16 @@ enum {
2929
GIT_DIFFCAPS_TRUST_NANOSECS = (1 << 5), /* use stat time nanoseconds */
3030
};
3131

32+
#define DIFF_FLAGS_KNOWN_BINARY (GIT_DIFF_FLAG_BINARY|GIT_DIFF_FLAG_NOT_BINARY)
33+
#define DIFF_FLAGS_NOT_BINARY (GIT_DIFF_FLAG_NOT_BINARY|GIT_DIFF_FLAG__NO_DATA)
34+
3235
enum {
3336
GIT_DIFF_FLAG__FREE_PATH = (1 << 7), /* `path` is allocated memory */
3437
GIT_DIFF_FLAG__FREE_DATA = (1 << 8), /* internal file data is allocated */
3538
GIT_DIFF_FLAG__UNMAP_DATA = (1 << 9), /* internal file data is mmap'ed */
3639
GIT_DIFF_FLAG__NO_DATA = (1 << 10), /* file data should not be loaded */
40+
GIT_DIFF_FLAG__FREE_BLOB = (1 << 11), /* release the blob when done */
41+
GIT_DIFF_FLAG__LOADED = (1 << 12), /* file data has been loaded */
3742

3843
GIT_DIFF_FLAG__TO_DELETE = (1 << 16), /* delete entry during rename det. */
3944
GIT_DIFF_FLAG__TO_SPLIT = (1 << 17), /* split entry during rename det. */
@@ -83,6 +88,12 @@ extern int git_diff__from_iterators(
8388
git_iterator *new_iter,
8489
const git_diff_options *opts);
8590

91+
extern int git_diff__paired_foreach(
92+
git_diff_list *idx2head,
93+
git_diff_list *wd2idx,
94+
int (*cb)(git_diff_delta *i2h, git_diff_delta *w2i, void *payload),
95+
void *payload);
96+
8697
int git_diff_find_similar__hashsig_for_file(
8798
void **out, const git_diff_file *f, const char *path, void *p);
8899

0 commit comments

Comments
 (0)