Thanks to visit codestin.com
Credit goes to github.com

Skip to content

made ascii string encoding faster #101777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 13, 2022

Conversation

gaaclarke
Copy link
Member

@gaaclarke gaaclarke commented Apr 12, 2022

In local testing this made the StandardMessageCodec_string benchmark go from 0.51338 µs to 0.34857µs (33% decrease).

Don't land before #101767

Test coverage already exists, this is just a performance change.

Pre-launch Checklist

  • I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • I read the Tree Hygiene wiki page, which explains my responsibilities.
  • I read and followed the Flutter Style Guide, including Features we expect every widget to implement.
  • I signed the CLA.
  • I listed at least one issue that this PR fixes in the description above.
  • I updated/added relevant documentation (doc comments with ///).
  • I added new tests to check the change I am making, or this PR is test-exempt.
  • All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel on Discord.

@flutter-dashboard flutter-dashboard bot added the framework flutter/packages/flutter repository. See also f: labels. label Apr 12, 2022
@gaaclarke gaaclarke force-pushed the faster-string-encoding branch 2 times, most recently from 6e661f0 to 95c6e0d Compare April 12, 2022 20:18
@gaaclarke gaaclarke marked this pull request as ready for review April 12, 2022 20:32
@flutter-dashboard
Copy link

It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat (don't just cc him here, he won't see it! He's on Discord!).

If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix?

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing.

final Uint8List asciiBytes = Uint8List(value.length);
Uint8List? utf8Bytes;
int utf8Offset = 0;
// Only do utf8 encoding if we encounter non-ascii characters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though dart utf8 encoding already had a fast path for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but here are the reasons I believe we are getting faster results:

  1. We are removing an extra copy of the data: https://github.com/dart-lang/sdk/blob/56035a7df0bb26a6babce53fae21d46263f3bf26/sdk/lib/convert/utf.dart#L115
  2. It inlines the logic
  3. We are removing bounds checks https://github.com/dart-lang/sdk/blob/56035a7df0bb26a6babce53fae21d46263f3bf26/sdk/lib/convert/utf.dart#L98

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dart's UTF8 encoder main loop is: https://github.com/dart-lang/sdk/blob/main/sdk/lib/convert/utf.dart#L197

For strings containing only ASCII Dart should just take each code unit and copy it into the output byte buffer, which is similar to what this loop is doing. I'm not sure why there would be a noticeable performance difference between Dart's encoder and this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see. Darts encoder guarantees that the Uint8List it gives you is not just a view of some larger buffer somewhere else, but we don't need to worry about that since we're immediately writing this into another one.

Makes sense!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jason-simmons that sublist at the end is copying the data though, that might be the biggest difference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - the typed data sublist is doing a memcpy. But this encoder can avoid that by writing the ASCII part and the non-ASCII part to the WriteBuffer as two separate chunks.

@gaaclarke gaaclarke requested a review from jonahwilliams April 12, 2022 20:52
@jason-simmons
Copy link
Member

Dart also provides Utf8Encoder.startChunkedConversion, which sends each chunk of UTF-8 byte data to a sink interface without doing a memcpy.

However, that probably won't work for StandardMessageCodec because StandardMessageCodec's output format writes the length of the UTF-8 string before the string data. So there is no way to avoid accumulating all the data into an intermediate buffer before writing it to StandardMessageCodec's output stream.

@jonahwilliams
Copy link
Contributor

In theory you could arrange things such that we accumulate the utf8 bits after leaving a spot for a length, while measuring the length, then go back and write it in.

Uint8List? utf8Bytes;
int utf8Offset = 0;
// Only do utf8 encoding if we encounter non-ascii characters.
for(int i = 0; i < value.length; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to do this ourselves we should have a wide variety of unit tests to ensure that we cover both ascii/utf8 sufficently to ensure that the data is not corrupted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you feel like those are sufficient?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code has complete test coverage. Every line is exercised by a test. I can't imagine another test or input that would exercise it differently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you feel that is sufficient, then that is fine. I also don't think you need a test exemption since you updated the benchmark, right?

@gaaclarke
Copy link
Member Author

In theory you could arrange things such that we accumulate the utf8 bits after leaving a spot for a length, while measuring the length, then go back and write it in.

We are using variable width sizes so you can't know how much space to reserve, except you could probably just choose the max size (5 bytes). You'd have to double check that the decoders would support that.

@gaaclarke gaaclarke force-pushed the faster-string-encoding branch from a799b72 to 6d5551d Compare April 13, 2022 00:11
@gaaclarke gaaclarke requested a review from jonahwilliams April 13, 2022 00:24
Copy link
Contributor

@jonahwilliams jonahwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
framework flutter/packages/flutter repository. See also f: labels.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants