-
Notifications
You must be signed in to change notification settings - Fork 28.9k
made ascii string encoding faster #101777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
made ascii string encoding faster #101777
Conversation
6e661f0
to
95c6e0d
Compare
It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat (don't just cc him here, he won't see it! He's on Discord!). If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix? Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing. |
final Uint8List asciiBytes = Uint8List(value.length); | ||
Uint8List? utf8Bytes; | ||
int utf8Offset = 0; | ||
// Only do utf8 encoding if we encounter non-ascii characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though dart utf8 encoding already had a fast path for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, but here are the reasons I believe we are getting faster results:
- We are removing an extra copy of the data: https://github.com/dart-lang/sdk/blob/56035a7df0bb26a6babce53fae21d46263f3bf26/sdk/lib/convert/utf.dart#L115
- It inlines the logic
- We are removing bounds checks https://github.com/dart-lang/sdk/blob/56035a7df0bb26a6babce53fae21d46263f3bf26/sdk/lib/convert/utf.dart#L98
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another bounds check we were able to remove: https://github.com/dart-lang/sdk/blob/56035a7df0bb26a6babce53fae21d46263f3bf26/sdk/lib/convert/utf.dart#L201
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dart's UTF8 encoder main loop is: https://github.com/dart-lang/sdk/blob/main/sdk/lib/convert/utf.dart#L197
For strings containing only ASCII Dart should just take each code unit and copy it into the output byte buffer, which is similar to what this loop is doing. I'm not sure why there would be a noticeable performance difference between Dart's encoder and this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I see. Darts encoder guarantees that the Uint8List it gives you is not just a view of some larger buffer somewhere else, but we don't need to worry about that since we're immediately writing this into another one.
Makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jason-simmons that sublist at the end is copying the data though, that might be the biggest difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense - the typed data sublist
is doing a memcpy. But this encoder can avoid that by writing the ASCII part and the non-ASCII part to the WriteBuffer
as two separate chunks.
Dart also provides However, that probably won't work for |
In theory you could arrange things such that we accumulate the utf8 bits after leaving a spot for a length, while measuring the length, then go back and write it in. |
Uint8List? utf8Bytes; | ||
int utf8Offset = 0; | ||
// Only do utf8 encoding if we encounter non-ascii characters. | ||
for(int i = 0; i < value.length; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to do this ourselves we should have a wide variety of unit tests to ensure that we cover both ascii/utf8 sufficently to ensure that the data is not corrupted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have those tests written here: https://github.com/flutter/flutter/blob/e6f302289014371326e480b293779827da0c81d5/packages/flutter/test/services/message_codecs_test.dart#L213:L213
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you feel like those are sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code has complete test coverage. Every line is exercised by a test. I can't imagine another test or input that would exercise it differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you feel that is sufficient, then that is fine. I also don't think you need a test exemption since you updated the benchmark, right?
We are using variable width sizes so you can't know how much space to reserve, except you could probably just choose the max size (5 bytes). You'd have to double check that the decoders would support that. |
Co-authored-by: Jonah Williams <[email protected]>
a799b72
to
6d5551d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In local testing this made the StandardMessageCodec_string benchmark go from 0.51338 µs to 0.34857µs (33% decrease).
Don't land before #101767
Test coverage already exists, this is just a performance change.
Pre-launch Checklist
///
).If you need help, consider asking for advice on the #hackers-new channel on Discord.