A high-performance difference algorithm module for Go.
Difference algorithms compare two inputs and find the edits that transform one to the other. This is very useful to understand changes, for example when comparing a test result with the expected result or to understand which changes have been made to a file.
This module provides diffing for arbitrary Go slices and text.
I wrote a bit about the background and the design decisions that went into this module on flo.znkr.io/diff.
To use this module in your Go project, run:
go get znkr.io/diffFull documentation available at pkg.go.dev/znkr.io/diff.
Diffing two slices produces either the full list of edits
x := strings.Fields("calm seas reflect the sky")
y := strings.Fields("restless seas reflect the sky defiantly")
edits := diff.Edits(x, y)
for i, edit := range edits {
if i > 0 {
fmt.Print(" ")
}
switch edit.Op {
case diff.Match:
fmt.Printf("%s", edit.X)
case diff.Delete:
fmt.Printf("[-%s-]", edit.X)
case diff.Insert:
fmt.Printf("{+%s+}", edit.Y)
default:
panic("never reached")
}
}
// Output:
// [-calm-] {+restless+} seas reflect the sky {+defiantly+}or a list of hunks representing consecutive edits
x := strings.Fields("calm seas reflect the sky")
y := strings.Fields("restless seas reflect the sky defiantly")
hunks := diff.Hunks(x, y, diff.Context(1))
for i, h := range hunks {
if i > 0 {
fmt.Print(" … ")
}
for i, edit := range h.Edits {
if i > 0 {
fmt.Print(" ")
}
switch edit.Op {
case diff.Match:
fmt.Printf("%s", edit.X)
case diff.Delete:
fmt.Printf("[-%s-]", edit.X)
case diff.Insert:
fmt.Printf("{+%s+}", edit.Y)
default:
panic("never reached")
}
}
}
// Output:
// [-calm-] {+restless+} seas … sky {+defiantly+}For both functions, a ...Func variant exists that works with arbitrary slices by taking an
equality function.
Because of its importance, comparing text line by line has special support and produces output in the unified diff format:
x := `this paragraph
is not
changed and
barely long
enough to
create a
new hunk
this paragraph
is going to be
removed
`
y := `this is a new paragraph
that is inserted at the top
this paragraph
is not
changed and
barely long
enough to
create a
new hunk
`
fmt.Print(textdiff.Unified(x, y))
// Output:
// @@ -1,3 +1,6 @@
// +this is a new paragraph
// +that is inserted at the top
// +
// this paragraph
// is not
// changed and
// @@ -5,7 +8,3 @@
// enough to
// create a
// new hunk
// -
// -this paragraph
// -is going to be
// -removedStatus: Beta - This project is in beta, pending API reviews and general feedback, both are very welcome.
As a general rule, the exact diff output will never be guaranteed to be stable: I expect that performance and quality improvements will always be possible and they will likely change the output of a diff. Therefore, committing to a stable diff result would be too limiting.
Diffs produced by this module are intended to be readable by humans.
Readable diffs have been the subject of a lot of discussions and have even resulted in some new diffing algorithms like the patience or histogram algorithms in git. However, the best work about diff readability by far is diff-slider-tools by Michael Haggerty. He implemented a heuristic that's applied in a post-processing step to improve the readability. This module implements this heuristic in the textdiff package.
For example:
x := `// ...
["foo", "bar", "baz"].map do |i|
i.upcase
end
`
y := `// ...
["foo", "bar", "baz"].map do |i|
i
end
["foo", "bar", "baz"].map do |i|
i.upcase
end
`
fmt.Println("With textdiff.IndentHeuristic:")
fmt.Print(textdiff.Unified(x, y, textdiff.IndentHeuristic()))
fmt.Println()
fmt.Println("Without textdiff.IndentHeuristic:")
fmt.Print(textdiff.Unified(x, y))
// Output:
// With textdiff.IndentHeuristic:
// @@ -1,4 +1,8 @@
// // ...
// +["foo", "bar", "baz"].map do |i|
// + i
// +end
// +
// ["foo", "bar", "baz"].map do |i|
// i.upcase
// end
//
// Without textdiff.IndentHeuristic:
// @@ -1,4 +1,8 @@
// // ...
// ["foo", "bar", "baz"].map do |i|
// + i
// +end
// +
// +["foo", "bar", "baz"].map do |i|
// i.upcase
// endBy default, the underlying diff algorithm used is Myers' algorithm augmented by a number of
heuristics to speed up the algorithm in exchange for non-minimal diffs. The diff.Minimal option is
provided to skip these heuristics to get a minimal diff independent of the costs and diff.Fast to
use a fast heuristic to get a non-minimal diff as fast as possible.
On an M1 Mac, the default settings almost always result in runtimes < 1 ms, but truly large diffs
(e.g. caused by changing generators for generated files) can result in runtimes of almost 100 ms.
Below is the distribution of runtimes applying textdiff.Unified to every commit in the Go
repository (y-axis is in log scale):
Comparing the performance with other Go modules that implement the same features is always
interesting, because it can surface missed optimization opportunities. This is especially
interesting for larger inputs where superlinear growth can become a problem. Below are benchmarks of
znkr.io/diff against other popular Go diff modules:
- znkr: Default configuration with performance optimizations enabled
- znkr-minimal: With
diff.Minimal()option for minimal diffs - znkr-fast: With
diff.Fast()option for fastest possible diffing - go-internal: Patience diff algorithm from
github.com/rogpeppe/go-internal - diffmatchpatch: Implementation from
github.com/sergi/go-diff - godebug: Implementation from
golang.org/x/tools/godebug - mb0: Implementation from
github.com/mb0/diff - udiff: Implementation from
github.com/aymanbagabas/go-udiff
Note: It's possible that the benchmark is using diffmatchpatch incorrectly, the benchmark
numbers certainly look suspiciously high. However, the way it's used in the benchmark is used in
at least one large open source project.
On the benchmarks used for this comparison znkr.io/diff almost always outperforms the other implementations. However, there's one case where go-internal is significantly faster, but the resulting diff is 10% larger (see numbers below).
| Test Case | znkr (baseline) | znkr-minimal | znkr-fast | go-internal | diffmatchpatch | godebug | mb0 | udiff |
|---|---|---|---|---|---|---|---|---|
| large_01 | 2.707ms | 10.993ms (+306.14%) |
2.642ms (-2.40%) |
4.928ms (+82.04%) |
43.205ms (+1496.15%) |
181.374ms (+6600.66%) |
84.950ms (+3038.39%) |
7.915ms (+192.40%) |
| large_02 | 20.591ms | 49.798ms (+141.84%) |
1.840ms (-91.06%) |
4.139ms (-79.90%) |
623.986ms (+2930.32%) |
3000.340ms (+14470.84%) |
1513.701ms (+7251.13%) |
6.457ms (-68.64%) |
| large_03 | 3.210ms | 15.138ms (+371.61%) |
3.130ms (-2.49%) |
4.688ms (+46.04%) |
31.851ms (+892.26%) |
187.093ms (+5728.54%) |
105.379ms (+3182.89%) |
10.057ms (+213.31%) |
| large_04 | 7.125ms | 249.229ms (+3397.94%) |
5.557ms (-22.01%) |
8.656ms (+21.49%) |
1012.579ms (+14111.61%) |
13230.536ms (+185591.43%) |
2229.906ms (+31196.87%) |
15.818ms (+122.01%) |
| medium | 26.79µs | 27.38µs (+2.23%) |
27.54µs (+2.81%) |
64.70µs (+141.55%) |
258.27µs (+864.18%) |
705.62µs (+2534.24%) |
269.56µs (+906.34%) |
290.81µs (+985.66%) |
| small | 18.30µs | 18.49µs (+1.05%) |
18.43µs (±0%) |
38.06µs (+107.97%) |
78.23µs (+327.41%) |
200.04µs (+992.97%) |
52.86µs (+188.83%) |
106.99µs (+484.55%) |
| Test Case | znkr (baseline) | znkr-minimal | znkr-fast | go-internal | diffmatchpatch | godebug | mb0 | udiff |
|---|---|---|---|---|---|---|---|---|
| large_01 | 5.615k edits | 5.615k edits (±0%) |
5.615k edits (±0%) |
5.617k edits (+0.04%) |
5.615k edits (±0%) |
5.615k edits (±0%) |
5.615k edits (±0%) |
35.805k edits (+537.67%) |
| large_02 | 28.87k edits | 28.83k edits (-0.15%) |
31.80k edits (+10.15%) |
31.81k edits (+10.17%) |
28.83k edits (-0.14%) |
28.83k edits (-0.15%) |
28.83k edits (-0.15%) |
31.80k edits (+10.13%) |
| large_03 | 5.504k edits | 5.504k edits (±0%) |
5.504k edits (±0%) |
5.506k edits (+0.04%) |
5.504k edits (±0%) |
5.504k edits (±0%) |
5.504k edits (±0%) |
55.738k edits (+912.68%) |
| large_04 | 26.99k edits | 26.99k edits (-0.01%) |
27.80k edits (+2.99%) |
27.80k edits (+2.99%) |
60.36k edits (+123.65%) |
26.99k edits (-0.01%) |
26.99k edits (-0.01%) |
103.22k edits (+282.45%) |
| medium | 277 edits | 277 edits (±0%) |
277 edits (±0%) |
283 edits (+2.17%) |
277 edits (±0%) |
277 edits (±0%) |
277 edits (±0%) |
431 edits (+55.60%) |
| small | 108 edits | 108 edits (±0%) |
114 edits (+5.56%) |
120 edits (+11.11%) |
108 edits (±0%) |
108 edits (±0%) |
108 edits (±0%) |
280 edits (+159.26%) |
I tested this diff implementation against every commit in the Go
repository using the standard unix patch tool to ensure that all
diff results are correct.
This test is part of the test suite for this module and can be run with
go run ./internal/cmd/eval -repo <repo>
This module is distributed under the Apache License, Version 2.0, see LICENSE for more information.