Keep data about source of similarity #1643
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change adds data about the source of similarity matching. Consider the case where you have two files, both very similar and both renamed and modified such that they rename very similar. For example:
Initial commit:
Class1.cs:
class Class1 { }
Class2.cs:
class Class2 { }
And these files are both renamed:
ClassA.cs:
class ClassA { }
ClassB.cs:
class ClassB { }
The loop in
git_diff_find_similar
loops at the possible rename targets (in this case, ClassA.cs and ClassB.cs) and compute similarity from possible sources and record them in thematches
array.If our
deltas
vector is:Then we will decide that ClassA.cs is 96% similar to Class1.cs and record it as
match[2]
. Since ClassA.cs is also 96% similar to Class2.cs, it is not better, and this match is ignored.Similarly, we will decide that ClassB.cs is 96% similar to Class1.cs and record it as
match[3]
. Again, Class2.cs will be ignored.After the loop to calculate similarity:
This gives us a rename from Class1.cs to ClassA.cs, and records Class2.cs as a delete and ClassB.cs as an add. This is nonoptimal and different from core git.
By adding data about the similarity source, we are able to avoid doubling up a single source as the best similarity match for two targets. With the proposed change, at the end of our loop:
And thus we have a rename from Class1.cs to ClassA.cs and a rename from Class2.cs to ClassB.cs.