Added Position field to the Tokenizer #128

stevevls · 2022-11-01T17:51:21Z

The Position gives a means to index into the Tokenizer's underlying byte slice. This enables use cases where the caller is planning on making edits to the JSON document but wants to leverage the copy func to optimize data movement and/or to copy remaining bytes if the caller wants to exit the tokenizing loop early.

The Position gives a means to index into the Tokenizer's underlying byte slice. This enables use cases where the caller is planning on making edits to the JSON document but wants to leverage the copy func to optimize data movement.

achille-roussel · 2022-11-01T18:41:01Z

json/token.go

+		t.Position += i
 	}

 	if len(t.json) == 0 {
 		t.Reset(nil)
 		return false
 	}

+	lenBefore := len(t.json)
+


Maybe a more robust implementation would be to use a defer?

lenBefore := len(t.json) defer func() { t.Position += lenBefore - len(t.json) }()

Unless there are measurable cost on benchmarks, I'd go with this.

Makes sense. From this spot on, I didn't see any early returns. Are you thinking more along the lines of future proofing the code?

Here are the benchmark results. The impact of the defer statement is small but measurable on my local machine. Not sure why all the percent changes are showing up as zero. :( Thoughts?

name old time/op new time/op delta Tokenizer/github.com/segmentio/encoding/json/null-8 25.9ns ± 0% 27.1ns ± 0% ~ (p=1.000 n=1+1) Tokenizer/github.com/segmentio/encoding/json/true-8 26.0ns ± 0% 27.6ns ± 0% ~ (p=1.000 n=1+1) Tokenizer/github.com/segmentio/encoding/json/false-8 26.9ns ± 0% 28.2ns ± 0% ~ (p=1.000 n=1+1) Tokenizer/github.com/segmentio/encoding/json/number-8 33.6ns ± 0% 34.9ns ± 0% ~ (p=1.000 n=1+1) Tokenizer/github.com/segmentio/encoding/json/string-8 31.7ns ± 0% 33.4ns ± 0% ~ (p=1.000 n=1+1) Tokenizer/github.com/segmentio/encoding/json/object-8 1.11µs ± 0% 1.24µs ± 0% ~ (p=1.000 n=1+1) name old speed new speed delta Tokenizer/github.com/segmentio/encoding/json/null-8 154MB/s ± 0% 148MB/s ± 0% ~ (p=1.000 n=1+1) Tokenizer/github.com/segmentio/encoding/json/true-8 154MB/s ± 0% 145MB/s ± 0% ~ (p=1.000 n=1+1) Tokenizer/github.com/segmentio/encoding/json/false-8 186MB/s ± 0% 177MB/s ± 0% ~ (p=1.000 n=1+1) Tokenizer/github.com/segmentio/encoding/json/number-8 327MB/s ± 0% 315MB/s ± 0% ~ (p=1.000 n=1+1) Tokenizer/github.com/segmentio/encoding/json/string-8 378MB/s ± 0% 360MB/s ± 0% ~ (p=1.000 n=1+1) Tokenizer/github.com/segmentio/encoding/json/object-8 613MB/s ± 0% 550MB/s ± 0% ~ (p=1.000 n=1+1)

Makes sense. From this spot on, I didn't see any early returns. Are you thinking more along the lines of future proofing the code?

I was thinking of moving this to the top of the function, but let's not sweat it if it's having a negative impact on the performance of the code 👍

The delta not showing might be due to not running the benchmarks with a -count large enough?

FYI @achille-roussel @stevevls I added #126 recently to keep track of the position of the tokenizer, but without any performance impact. The caller does have to retain the input byte slice so that it can convert bytes remaining back to an index.

Added Position field to the Tokenizer

97b2b67

The Position gives a means to index into the Tokenizer's underlying byte slice. This enables use cases where the caller is planning on making edits to the JSON document but wants to leverage the copy func to optimize data movement.

stevevls requested a review from achille-roussel November 1, 2022 17:51

achille-roussel reviewed Nov 1, 2022

View reviewed changes

achille-roussel approved these changes Nov 1, 2022

View reviewed changes

stevevls merged commit b2d0aeb into master Nov 2, 2022

stevevls deleted the svls/position branch November 2, 2022 16:38

stevevls pushed a commit that referenced this pull request Nov 10, 2022

Revert #128, update docs and add tests to Tokenizer.Remaining()

dc40a9e

stevevls mentioned this pull request Nov 10, 2022

Revert #128, update docs and add tests to Tokenizer.Remaining() #129

Merged

stevevls pushed a commit that referenced this pull request Nov 10, 2022

Revert #128, update docs and add tests to Tokenizer.Remaining() (#129)

3b49d71

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added Position field to the Tokenizer #128

Added Position field to the Tokenizer #128

Uh oh!

stevevls commented Nov 1, 2022

Uh oh!

achille-roussel Nov 1, 2022

Uh oh!

stevevls Nov 1, 2022

Uh oh!

stevevls Nov 1, 2022

Uh oh!

achille-roussel Nov 1, 2022

Uh oh!

chriso Nov 8, 2022

Uh oh!

Uh oh!

Added Position field to the Tokenizer #128

Added Position field to the Tokenizer #128

Uh oh!

Conversation

stevevls commented Nov 1, 2022

Uh oh!

achille-roussel Nov 1, 2022

Choose a reason for hiding this comment

Uh oh!

stevevls Nov 1, 2022

Choose a reason for hiding this comment

Uh oh!

stevevls Nov 1, 2022

Choose a reason for hiding this comment

Uh oh!

achille-roussel Nov 1, 2022

Choose a reason for hiding this comment

Uh oh!

chriso Nov 8, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!