fix(core): handle multibyte UTF-8 characters in socket message consumption#34151
Conversation
👷 Deploy request for nx-docs pending review.Visit the deploys page to approve it
|
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
View your CI Pipeline Execution ↗ for commit cbfc1ba
☁️ Nx Cloud last updated this comment at |
…ption Use StringDecoder to properly decode UTF-8 data that may be split across socket chunks. This prevents corruption when multibyte characters (such as CJK characters) are split at arbitrary byte boundaries.
2527020 to
e57a6e9
Compare
|
Hello @AgentEnder . Thanks for reviewing my PR! I just rebased onto main branch to resolve conflict. (should be merge..)So I need another approval to run CI workflow. Thanks |
There was a problem hiding this comment.
Nx Cloud has identified a flaky task in your failed CI:
Since the failure was identified as flaky, the solution is to rerun CI. Because this branch comes from a fork, it is not possible for us to push directly, but you can rerun by pushing an empty commit:
git commit --allow-empty -m "chore: trigger rerun"
git push
🎓 Learn more about Self-Healing CI on nx.dev
…ption (#34151) ## Current Behavior When socket data chunks split a multibyte UTF-8 character (e.g., CJK characters like Korean, Chinese, Japanese) at an arbitrary byte boundary, `Buffer.toString()` decodes incomplete byte sequences as replacement characters (�), causing message corruption. This can occur when: - File paths contain non-ASCII characters - Project names include multibyte characters - Any JSON message contains international text ## Expected Behavior Multibyte UTF-8 characters should be properly decoded even when split across multiple socket data chunks. The fix uses Node.js `StringDecoder` which buffers incomplete multibyte sequences until the remaining bytes arrive. ## Related Issue(s) Fixes socket message corruption for paths/names containing multibyte characters. (cherry picked from commit e35dcd2)
|
This pull request has already been merged/closed. If you experience issues related to these changes, please open a new issue referencing this pull request. |
Current Behavior
When socket data chunks split a multibyte UTF-8 character (e.g., CJK characters like Korean, Chinese, Japanese) at an arbitrary byte boundary,
Buffer.toString()decodes incomplete byte sequences as replacement characters (�), causing message corruption.This can occur when:
Expected Behavior
Multibyte UTF-8 characters should be properly decoded even when split across multiple socket data chunks. The fix uses Node.js
StringDecoderwhich buffers incomplete multibyte sequences until the remaining bytes arrive.Related Issue(s)
Fixes socket message corruption for paths/names containing multibyte characters.