src: improve StringBytes::Encode perf on ASCII #61119

ChALkeR · 2025-12-18T21:39:48Z

Tracking: #61041

This significantly improves both utf8 TextDecoder and Buffer#toString() performance on ASCII

And removes the main reason why import { TextDecoder } from '@exodus/bytes/encoding.js' beats both Node.js TextDecoder and Node.js Buffer on Node.js for utf-8 🙃

See #61041 (comment)

Using https://github.com/lemire/jstextdecoderbench by @lemire:

Buffer#toString() - utf8

main:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	17.60 GiB/s	0.005 ms
Arabic lipsum	79.771 KiB	0.26 GiB/s	0.294 ms
Chinese lipsum	68.203 KiB	0.32 GiB/s	0.212 ms

PR:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	33.45 GiB/s	0.003 ms
Arabic lipsum	79.771 KiB	0.27 GiB/s	0.288 ms
Chinese lipsum	68.203 KiB	0.32 GiB/s	0.204 ms

TextDecoder, loose

main:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	17.90 GiB/s	0.005 ms
Arabic lipsum	79.771 KiB	0.27 GiB/s	0.286 ms
Chinese lipsum	68.203 KiB	0.33 GiB/s	0.200 ms

PR:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	35.27 GiB/s	0.003 ms
Arabic lipsum	79.771 KiB	0.27 GiB/s	0.284 ms
Chinese lipsum	68.203 KiB	0.32 GiB/s	0.203 ms

TextDecoder, fatal

main:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	15.17 GiB/s	0.006 ms
Arabic lipsum	79.771 KiB	0.26 GiB/s	0.292 ms
Chinese lipsum	68.203 KiB	0.32 GiB/s	0.206 ms

PR:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	35.31 GiB/s	0.003 ms
Arabic lipsum	79.771 KiB	0.26 GiB/s	0.289 ms
Chinese lipsum	68.203 KiB	0.31 GiB/s	0.207 ms

cc @nodejs/performance

codecov · 2025-12-19T00:40:43Z

Codecov Report

❌ Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 88.53%. Comparing base (4f24aff) to head (866781f).
⚠️ Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
src/encoding_binding.cc	92.30%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #61119      +/-   ##
==========================================
- Coverage   88.53%   88.53%   -0.01%     
==========================================
  Files         703      703              
  Lines      208546   208555       +9     
  Branches    40217    40217              
==========================================
+ Hits       184634   184638       +4     
+ Misses      15926    15915      -11     
- Partials     7986     8002      +16

Files with missing lines	Coverage Δ
src/string_bytes.cc	`69.74% <100.00%> (-0.77%)`	⬇️
src/encoding_binding.cc	`55.33% <92.30%> (+0.72%)`	⬆️

... and 41 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

lemire

This looks good to me. It is simple, likely quite safe.

ChALkeR · 2025-12-19T05:11:17Z

Upd: removed useless reinterpret_cast, otherwise unchanged

ChALkeR · 2025-12-19T05:41:42Z

@lemire I notice that !simdutf::validate_ascii_with_errors().error is consistently faster than simdutf::validate_ascii()
(36 GiB/s vs 34-35 GiB/s on ASCII-utf8-to-text conversion)

Why is that and is there a reason to prefer the latter?
Code around seems to be using the former, should I just switch to validate_ascii_with_errors?

Upd, ah: #46271 (comment), updated PR

mcollina

lgtm

nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Dec 18, 2025

ChALkeR mentioned this pull request Dec 18, 2025

TextDecoder is wrong and very slow #61041

Open

ChALkeR force-pushed the chalker/decoder/ascii/0 branch 4 times, most recently from 0bfc0ec to 4f82c5a Compare December 18, 2025 21:54

ChALkeR mentioned this pull request Dec 19, 2025

src: improve windows1252 decoding speed #61120

Draft

lemire approved these changes Dec 19, 2025

View reviewed changes

src: improve StringBytes::Encode perf on ASCII

866781f

ChALkeR force-pushed the chalker/decoder/ascii/0 branch from 4f82c5a to 866781f Compare December 19, 2025 05:11

mcollina approved these changes Dec 19, 2025

View reviewed changes

mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Dec 19, 2025

src: use validate_ascii_with_errors

42b4ff4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

src: improve StringBytes::Encode perf on ASCII #61119

src: improve StringBytes::Encode perf on ASCII #61119

ChALkeR commented Dec 18, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 19, 2025 •

edited

Loading

Uh oh!

lemire left a comment

Uh oh!

ChALkeR commented Dec 19, 2025 •

edited

Loading

Uh oh!

ChALkeR commented Dec 19, 2025 •

edited

Loading

Uh oh!

mcollina left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

src: improve StringBytes::Encode perf on ASCII #61119

Are you sure you want to change the base?

src: improve StringBytes::Encode perf on ASCII #61119

Conversation

ChALkeR commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Buffer#toString() - utf8

TextDecoder, loose

TextDecoder, fatal

Uh oh!

codecov bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lemire left a comment

Choose a reason for hiding this comment

Uh oh!

ChALkeR commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChALkeR commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcollina left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ChALkeR commented Dec 18, 2025 •

edited

Loading

codecov bot commented Dec 19, 2025 •

edited

Loading

ChALkeR commented Dec 19, 2025 •

edited

Loading

ChALkeR commented Dec 19, 2025 •

edited

Loading