Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ChALkeR
Copy link
Member

@ChALkeR ChALkeR commented Dec 18, 2025

Tracking: #61041

This significantly improves both utf8 TextDecoder and Buffer#toString() performance on ASCII

And removes the main reason why import { TextDecoder } from '@exodus/bytes/encoding.js' beats both Node.js TextDecoder and Node.js Buffer on Node.js for utf-8 🙃

See #61041 (comment)

Using https://github.com/lemire/jstextdecoderbench by @lemire:

Buffer#toString() - utf8

main:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 17.60 GiB/s 0.005 ms
Arabic lipsum 79.771 KiB 0.26 GiB/s 0.294 ms
Chinese lipsum 68.203 KiB 0.32 GiB/s 0.212 ms

PR:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 33.45 GiB/s 0.003 ms
Arabic lipsum 79.771 KiB 0.27 GiB/s 0.288 ms
Chinese lipsum 68.203 KiB 0.32 GiB/s 0.204 ms

TextDecoder, loose

main:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 17.90 GiB/s 0.005 ms
Arabic lipsum 79.771 KiB 0.27 GiB/s 0.286 ms
Chinese lipsum 68.203 KiB 0.33 GiB/s 0.200 ms

PR:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 35.27 GiB/s 0.003 ms
Arabic lipsum 79.771 KiB 0.27 GiB/s 0.284 ms
Chinese lipsum 68.203 KiB 0.32 GiB/s 0.203 ms

TextDecoder, fatal

main:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 15.17 GiB/s 0.006 ms
Arabic lipsum 79.771 KiB 0.26 GiB/s 0.292 ms
Chinese lipsum 68.203 KiB 0.32 GiB/s 0.206 ms

PR:

Test Size Throughput Mean Time
Latin lipsum (ASCII) 84.902 KiB 35.31 GiB/s 0.003 ms
Arabic lipsum 79.771 KiB 0.26 GiB/s 0.289 ms
Chinese lipsum 68.203 KiB 0.31 GiB/s 0.207 ms

cc @nodejs/performance

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Dec 18, 2025
@ChALkeR ChALkeR force-pushed the chalker/decoder/ascii/0 branch 4 times, most recently from 0bfc0ec to 4f82c5a Compare December 18, 2025 21:54
@codecov
Copy link

codecov bot commented Dec 19, 2025

Codecov Report

❌ Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 88.53%. Comparing base (4f24aff) to head (866781f).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
src/encoding_binding.cc 92.30% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #61119      +/-   ##
==========================================
- Coverage   88.53%   88.53%   -0.01%     
==========================================
  Files         703      703              
  Lines      208546   208555       +9     
  Branches    40217    40217              
==========================================
+ Hits       184634   184638       +4     
+ Misses      15926    15915      -11     
- Partials     7986     8002      +16     
Files with missing lines Coverage Δ
src/string_bytes.cc 69.74% <100.00%> (-0.77%) ⬇️
src/encoding_binding.cc 55.33% <92.30%> (+0.72%) ⬆️

... and 41 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. It is simple, likely quite safe.

@ChALkeR ChALkeR force-pushed the chalker/decoder/ascii/0 branch from 4f82c5a to 866781f Compare December 19, 2025 05:11
@ChALkeR
Copy link
Member Author

ChALkeR commented Dec 19, 2025

Upd: removed useless reinterpret_cast, otherwise unchanged

@ChALkeR
Copy link
Member Author

ChALkeR commented Dec 19, 2025

@lemire I notice that !simdutf::validate_ascii_with_errors().error is consistently faster than simdutf::validate_ascii()
(36 GiB/s vs 34-35 GiB/s on ASCII-utf8-to-text conversion)

Why is that and is there a reason to prefer the latter?
Code around seems to be using the former, should I just switch to validate_ascii_with_errors?

Upd, ah: #46271 (comment), updated PR

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. request-ci Add this label to start a Jenkins CI on a PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants