-
Notifications
You must be signed in to change notification settings - Fork 53
Added Position field to the Tokenizer #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The Position gives a means to index into the Tokenizer's underlying byte slice. This enables use cases where the caller is planning on making edits to the JSON document but wants to leverage the copy func to optimize data movement.
t.Position += i | ||
} | ||
|
||
if len(t.json) == 0 { | ||
t.Reset(nil) | ||
return false | ||
} | ||
|
||
lenBefore := len(t.json) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a more robust implementation would be to use a defer?
lenBefore := len(t.json)
defer func() { t.Position += lenBefore - len(t.json) }()
Unless there are measurable cost on benchmarks, I'd go with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. From this spot on, I didn't see any early returns. Are you thinking more along the lines of future proofing the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the benchmark results. The impact of the defer statement is small but measurable on my local machine. Not sure why all the percent changes are showing up as zero. :( Thoughts?
name old time/op new time/op delta
Tokenizer/github.com/segmentio/encoding/json/null-8 25.9ns ± 0% 27.1ns ± 0% ~ (p=1.000 n=1+1)
Tokenizer/github.com/segmentio/encoding/json/true-8 26.0ns ± 0% 27.6ns ± 0% ~ (p=1.000 n=1+1)
Tokenizer/github.com/segmentio/encoding/json/false-8 26.9ns ± 0% 28.2ns ± 0% ~ (p=1.000 n=1+1)
Tokenizer/github.com/segmentio/encoding/json/number-8 33.6ns ± 0% 34.9ns ± 0% ~ (p=1.000 n=1+1)
Tokenizer/github.com/segmentio/encoding/json/string-8 31.7ns ± 0% 33.4ns ± 0% ~ (p=1.000 n=1+1)
Tokenizer/github.com/segmentio/encoding/json/object-8 1.11µs ± 0% 1.24µs ± 0% ~ (p=1.000 n=1+1)
name old speed new speed delta
Tokenizer/github.com/segmentio/encoding/json/null-8 154MB/s ± 0% 148MB/s ± 0% ~ (p=1.000 n=1+1)
Tokenizer/github.com/segmentio/encoding/json/true-8 154MB/s ± 0% 145MB/s ± 0% ~ (p=1.000 n=1+1)
Tokenizer/github.com/segmentio/encoding/json/false-8 186MB/s ± 0% 177MB/s ± 0% ~ (p=1.000 n=1+1)
Tokenizer/github.com/segmentio/encoding/json/number-8 327MB/s ± 0% 315MB/s ± 0% ~ (p=1.000 n=1+1)
Tokenizer/github.com/segmentio/encoding/json/string-8 378MB/s ± 0% 360MB/s ± 0% ~ (p=1.000 n=1+1)
Tokenizer/github.com/segmentio/encoding/json/object-8 613MB/s ± 0% 550MB/s ± 0% ~ (p=1.000 n=1+1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. From this spot on, I didn't see any early returns. Are you thinking more along the lines of future proofing the code?
I was thinking of moving this to the top of the function, but let's not sweat it if it's having a negative impact on the performance of the code 👍
The delta not showing might be due to not running the benchmarks with a -count
large enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @achille-roussel @stevevls I added #126 recently to keep track of the position of the tokenizer, but without any performance impact. The caller does have to retain the input byte slice so that it can convert bytes remaining back to an index.
The Position gives a means to index into the Tokenizer's underlying byte slice. This enables use cases where the caller is planning on making edits to the JSON document but wants to leverage the copy func to optimize data movement and/or to copy remaining bytes if the caller wants to exit the tokenizing loop early.