Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Conversation

@GGabriele
Copy link

@GGabriele GGabriele commented Jan 19, 2022

As highlighted in various issues (#24, #30), the library is currently failing at computing correct and reliable diffs for arrays, and more specifically whenever the size of the "left" array is less than the "right" one:

case 1: len(left) < len(right) (bogus diffs)

$ cat before.json
{
   "array": [
      "x-posting-ip"
   ]
 }

$ cat after.json
{
   "array": [
      "accept-encoding",
      "x-forwarded-for"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
-    0: "x-posting-ip"
+    0: "accept-encoding"
+    0: "accept-encoding"
   ]
 }

case 1 (continuation): len(left) < len(right) (unreliable diffs)

$ cat before.json
{
   "array": [
      "blabla"
   ]
 }

$ cat after.json
{
   "array": [
      "accept-encoding",
      "x-forwarded-for"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
+    0: "accept-encoding"
   ]
 }

case 2: len(left) == len(right) (OK)

$ cat before.json
{
   "array": [
      "blabla",
      "blabla"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
-    0: "blabla",
+    0: "accept-encoding",
-    1: "blabla"
+    1: "x-forwarded-for"
   ]
 }

case 3: len(left) > len(right) (diffs are OK, but there is a lingering duplicate element at the bottom of the array)

$ cat before.json
{
   "array": [
      "blabla",
      "blabla",
      "blabla"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
-    0: "blabla",
+    0: "accept-encoding",
-    0: "blabla",
-    1: "blabla"
+    1: "x-forwarded-for"
     2: "blabla"                   // shouldn't be here
   ]
 }

Both issues highlighted in case 1 are the most serious ones, while the one in case 3 is mostly visually confusing.

I believe the cause of these issues are two:

  1. when you loop through the maybe matrixes here, you are missing the last element of each row/column
  2. the way the similarity and its score is calculated for strings is not reliable

What I'm proposing here is a simple fix for the (1) and to use strutil to calculate the similarity for (2).

$ cat before.json
{
   "array": [
      "x-posting-ip"
   ]
 }

$ cat after.json
{
   "array": [
      "accept-encoding",
      "x-forwarded-for"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
-    0: "x-posting-ip"
+    0: "accept-encoding"
+    1: "x-forwarded-for"
   ]
 }
$ cat before.json
{
   "array": [
      "blabla"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
-    0: "blabla"
+    0: "accept-encoding"
+    1: "x-forwarded-for"
   ]
 }

Changes don't break current test cases:

$ go test -race ./...
# gojsondiff/jd
jd/main.go:42:4: Println arg list ends with redundant newline
# gojsondiff/jp
jp/main.go:24:4: Println arg list ends with redundant newline
ok  	gojsondiff	1.919s
ok  	gojsondiff/formatter	1.104s

The code is quite complex, so I may be missing some obvious things, so please let me know if that's the case!

I'm also introducing go.mod in the same PR, which is resulting in a giant bulk of changes. The actual code changes are very minimal and all under the gojsondiff.go file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant