Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 114f5a6

Browse files
committed
Reorganize diff and add basic diff driver
This is a significant reorganization of the diff code to break it into a set of more clearly distinct files and to document the new organization. Hopefully this will make the diff code easier to understand and to extend. This adds a new `git_diff_driver` object that looks of diff driver information from the attributes and the config so that things like function content in diff headers can be provided. The full driver spec is not implemented in the commit - this is focused on the reorganization of the code and putting the driver hooks in place. This also removes a few #includes from src/repository.h that were overbroad, but as a result required extra #includes in a variety of places since including src/repository.h no longer results in pulling in the whole world.
1 parent 7000f3f commit 114f5a6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1951
-1665
lines changed

docs/diff-internals.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
Diff is broken into four phases:
2+
3+
1. Building a list of things that have changed. These changes are called
4+
deltas (git_diff_delta objects) and are grouped into a git_diff_list.
5+
2. Applying file similarity measurement for rename and copy detection (and
6+
to potentially split files that have changed radically). This step is
7+
optional.
8+
3. Computing the textual diff for each delta. Not all deltas have a
9+
meaningful textual diff. For those that do, the textual diff can
10+
either be generated on the fly and passed to output callbacks or can be
11+
turned into a git_diff_patch object.
12+
4. Formatting the diff and/or patch into standard text formats (such as
13+
patches, raw lists, etc).
14+
15+
In the source code, step 1 is implemented in `src/diff.c`, step 2 in
16+
`src/diff_tform.c`, step 3 in `src/diff_patch.c`, and step 4 in
17+
`src/diff_print.c`. Additionally, when it comes to accessing file
18+
content, everything goes through diff drivers that are implemented in
19+
`src/diff_driver.c`.
20+
21+
External Objects
22+
----------------
23+
24+
* `git_diff_options` repesents user choices about how a diff should be
25+
performed and is passed to most diff generating functions.
26+
* `git_diff_file` represents an item on one side of a possible delta
27+
* `git_diff_delta` represents a pair of items that have changed in some
28+
way - it contains two `git_diff_file` plus a status and other stuff.
29+
* `git_diff_list` is a list of deltas along with information about how
30+
those particular deltas were found.
31+
* `git_diff_patch` represents the actual diff between a pair of items. In
32+
some cases, a delta may not have a corresponding patch, if the objects
33+
are binary, for example. The content of a patch will be a set of hunks
34+
and lines.
35+
* A `hunk` is range of lines described by a `git_diff_range` (i.e. "lines
36+
10-20 in the old file became lines 12-23 in the new"). It will have a
37+
header that compactly represents that information, and it will have a
38+
number of lines of context surrounding added and deleted lines.
39+
* A `line` is simple a line of data along with a `git_diff_line_t` value
40+
that tells how the data should be interpretted (e.g. context or added).
41+
42+
Internal Objects
43+
----------------
44+
45+
* `git_diff_file_content` is an internal structure that represents the
46+
data on one side of an item to be diffed; it is an augmented
47+
`git_diff_file` with more flags and the actual file data.
48+
** it is created from a repository plus a) a git_diff_file, b) a git_blob,
49+
or c) raw data and size
50+
** there are three main operations on git_diff_file_content:
51+
*** _initialization_ sets up the data structure and does what it can up to,
52+
but not including loading and looking at the actual data
53+
*** _loading_ loads the data, preprocesses it (i.e. applies filters) and
54+
potentially analyzes it (to decide if binary)
55+
*** _free_ releases loaded data and frees any allocated memory
56+
57+
* The internal structure of a `git_diff_patch` stores the actual diff
58+
between a pair of `git_diff_file_content` items
59+
** it may be "unset" if the items are not diffable
60+
** "empty" if the items are the same
61+
** otherwise it will consist of a set of hunks each of which covers some
62+
number of lines of context, additions and deletions
63+
** a patch is created from two git_diff_file_content items
64+
** a patch is fully instantiated in three phases:
65+
*** initial creation and initialization
66+
*** loading of data and preliminary data examination
67+
*** diffing of data and optional storage of diffs
68+
** (TBD) if a patch is asked to store the diffs and the size of the diff
69+
is significantly smaller than the raw data of the two sides, then the
70+
patch may be flattened using a pool of string data
71+
72+
* `git_diff_output` is an internal structure that represents an output
73+
target for a `git_diff_patch`
74+
** It consists of file, hunk, and line callbacks, plus a payload
75+
** There is a standard flattened output that can be used for plain text output
76+
** Typically we use a `git_xdiff_output` which drives the callbacks via the
77+
xdiff code taken from core Git.
78+
79+
* `git_diff_driver` is an internal structure that encapsulates the logic
80+
for a given type of file
81+
** a driver is looked up based on the name and mode of a file.
82+
** the driver can then be used to:
83+
*** determine if a file is binary (by attributes, by git_diff_options
84+
settings, or by examining the content)
85+
*** give you a function pointer that is used to evaluate function context
86+
for hunk headers
87+
** At some point, the logic for getting a filtered version of file content
88+
or calculating the OID of a file may be moved into the driver.
89+

src/blob.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
#include "git2/odb_backend.h"
1212

1313
#include "common.h"
14+
#include "filebuf.h"
1415
#include "blob.h"
1516
#include "filter.h"
1617
#include "buf_text.h"

src/checkout.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020

2121
#include "refs.h"
2222
#include "repository.h"
23+
#include "index.h"
2324
#include "filter.h"
2425
#include "blob.h"
2526
#include "diff.h"

src/clone.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
#include "fileops.h"
2222
#include "refs.h"
2323
#include "path.h"
24+
#include "repository.h"
2425

2526
static int create_branch(
2627
git_reference **branch,

src/crlf.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,16 @@
55
* a Linking Exception. For full terms see the included COPYING file.
66
*/
77

8+
#include "git2/attr.h"
9+
#include "git2/blob.h"
10+
#include "git2/index.h"
11+
812
#include "common.h"
913
#include "fileops.h"
1014
#include "hash.h"
1115
#include "filter.h"
1216
#include "buf_text.h"
1317
#include "repository.h"
14-
#include "git2/attr.h"
15-
#include "git2/blob.h"
1618

1719
struct crlf_attrs {
1820
int crlf_action;

src/diff.c

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
#include "attr_file.h"
1212
#include "filter.h"
1313
#include "pathspec.h"
14+
#include "index.h"
15+
#include "odb.h"
1416

1517
#define DIFF_FLAG_IS_SET(DIFF,FLAG) (((DIFF)->opts.flags & (FLAG)) != 0)
1618
#define DIFF_FLAG_ISNT_SET(DIFF,FLAG) (((DIFF)->opts.flags & (FLAG)) == 0)
@@ -1170,3 +1172,73 @@ int git_diff_tree_to_workdir(
11701172

11711173
return error;
11721174
}
1175+
1176+
size_t git_diff_num_deltas(git_diff_list *diff)
1177+
{
1178+
assert(diff);
1179+
return (size_t)diff->deltas.length;
1180+
}
1181+
1182+
size_t git_diff_num_deltas_of_type(git_diff_list *diff, git_delta_t type)
1183+
{
1184+
size_t i, count = 0;
1185+
git_diff_delta *delta;
1186+
1187+
assert(diff);
1188+
1189+
git_vector_foreach(&diff->deltas, i, delta) {
1190+
count += (delta->status == type);
1191+
}
1192+
1193+
return count;
1194+
}
1195+
1196+
int git_diff__paired_foreach(
1197+
git_diff_list *idx2head,
1198+
git_diff_list *wd2idx,
1199+
int (*cb)(git_diff_delta *i2h, git_diff_delta *w2i, void *payload),
1200+
void *payload)
1201+
{
1202+
int cmp;
1203+
git_diff_delta *i2h, *w2i;
1204+
size_t i, j, i_max, j_max;
1205+
int (*strcomp)(const char *, const char *);
1206+
1207+
i_max = idx2head ? idx2head->deltas.length : 0;
1208+
j_max = wd2idx ? wd2idx->deltas.length : 0;
1209+
1210+
/* Get appropriate strcmp function */
1211+
strcomp = idx2head ? idx2head->strcomp : wd2idx ? wd2idx->strcomp : NULL;
1212+
1213+
/* Assert both iterators use matching ignore-case. If this function ever
1214+
* supports merging diffs that are not sorted by the same function, then
1215+
* it will need to spool and sort on one of the results before merging
1216+
*/
1217+
if (idx2head && wd2idx) {
1218+
assert(idx2head->strcomp == wd2idx->strcomp);
1219+
}
1220+
1221+
for (i = 0, j = 0; i < i_max || j < j_max; ) {
1222+
i2h = idx2head ? GIT_VECTOR_GET(&idx2head->deltas,i) : NULL;
1223+
w2i = wd2idx ? GIT_VECTOR_GET(&wd2idx->deltas,j) : NULL;
1224+
1225+
cmp = !w2i ? -1 : !i2h ? 1 :
1226+
strcomp(i2h->old_file.path, w2i->old_file.path);
1227+
1228+
if (cmp < 0) {
1229+
if (cb(i2h, NULL, payload))
1230+
return GIT_EUSER;
1231+
i++;
1232+
} else if (cmp > 0) {
1233+
if (cb(NULL, w2i, payload))
1234+
return GIT_EUSER;
1235+
j++;
1236+
} else {
1237+
if (cb(i2h, w2i, payload))
1238+
return GIT_EUSER;
1239+
i++; j++;
1240+
}
1241+
}
1242+
1243+
return 0;
1244+
}

src/diff.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,11 +29,16 @@ enum {
2929
GIT_DIFFCAPS_TRUST_NANOSECS = (1 << 5), /* use stat time nanoseconds */
3030
};
3131

32+
#define DIFF_FLAGS_KNOWN_BINARY (GIT_DIFF_FLAG_BINARY|GIT_DIFF_FLAG_NOT_BINARY)
33+
#define DIFF_FLAGS_NOT_BINARY (GIT_DIFF_FLAG_NOT_BINARY|GIT_DIFF_FLAG__NO_DATA)
34+
3235
enum {
3336
GIT_DIFF_FLAG__FREE_PATH = (1 << 7), /* `path` is allocated memory */
3437
GIT_DIFF_FLAG__FREE_DATA = (1 << 8), /* internal file data is allocated */
3538
GIT_DIFF_FLAG__UNMAP_DATA = (1 << 9), /* internal file data is mmap'ed */
3639
GIT_DIFF_FLAG__NO_DATA = (1 << 10), /* file data should not be loaded */
40+
GIT_DIFF_FLAG__FREE_BLOB = (1 << 11), /* release the blob when done */
41+
GIT_DIFF_FLAG__LOADED = (1 << 12), /* file data has been loaded */
3742

3843
GIT_DIFF_FLAG__TO_DELETE = (1 << 16), /* delete entry during rename det. */
3944
GIT_DIFF_FLAG__TO_SPLIT = (1 << 17), /* split entry during rename det. */
@@ -83,6 +88,12 @@ extern int git_diff__from_iterators(
8388
git_iterator *new_iter,
8489
const git_diff_options *opts);
8590

91+
extern int git_diff__paired_foreach(
92+
git_diff_list *idx2head,
93+
git_diff_list *wd2idx,
94+
int (*cb)(git_diff_delta *i2h, git_diff_delta *w2i, void *payload),
95+
void *payload);
96+
8697
int git_diff_find_similar__hashsig_for_file(
8798
void **out, const git_diff_file *f, const char *path, void *p);
8899

0 commit comments

Comments
 (0)