-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Add LSX accelerated implementation for source file analysis #145963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
532e46b
to
a0b4864
Compare
I’m not your reviewer but please add a PR description describing what it’s doing, or what LSX is, to avoid having to infer it from the code, and also why (with some before/after numbers if it’s indeed an optimization like it seems to be) @rustbot author |
Reminder, once the PR becomes ready for a review, use |
a0b4864
to
73287f3
Compare
This comment has been minimized.
This comment has been minimized.
☔ The latest upstream changes (presumably #146023) made this pull request unmergeable. Please resolve the merge conflicts. |
73287f3
to
2b6510e
Compare
This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
for (chunk_index, chunk) in chunks.iter().enumerate() { | ||
let chunk = unsafe { lsx_vld::<0>(chunk.as_ptr() as *const i8) }; | ||
|
||
// For character in the chunk, see if its byte value is < 0, which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is pre-existing from the SSE2 impl, so probably fix it there as well.
// For character in the chunk, see if its byte value is < 0, which | |
// For each character in the chunk, see if its byte value is < 0, which |
} | ||
_ => { | ||
// The target (or compiler version) does not support SSE2 ... | ||
// The target (or compiler version) does not support SSE2/LSX ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like
// The target (or compiler version) does not support SSE2/LSX ... | |
// The target (or compiler version) does not support vector instructions our specialized implementations need (x86 SSE2, loongarch64 LSX)... |
let mut intra_chunk_offset = 0; | ||
|
||
for (chunk_index, chunk) in chunks.iter().enumerate() { | ||
let chunk = unsafe { lsx_vld::<0>(chunk.as_ptr() as *const i8) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably add a comment about vld not having alignment requirements here?
Does this yield observable high level gains on the rustc-perf suite? I don't feel we need a microbenchmark to see how fast the compiler is at recognizing new lines, though it's good to have that data. Thanks for providing it. I don't know anything about LSX so I may not be a suited reviewer for this, but the implementation matches the SSE2 version very well (to the point of me wondering whether we'll be able to use So yeah, this seems good in my uninformed opinion, and we can drop the first commit and fix the couple nits. @rustbot author |
This patch introduces an LSX-optimized version of `analyze_source_file` for the `loongarch64` target. Similar to existing SSE2 implementation for x86, this version: - Processes 16-byte chunks at a time using LSX vector intrinsics. - Quickly identifies newlines in ASCII-only chunks. - Falls back to the generic implementation when multi-byte UTF-8 characters are detected or in the tail portion.
2b6510e
to
5b43244
Compare
Thanks for your review! @rustbot ready
I'm not sure, but it's worth checking. I don't know whether rustc-perf is fully usable on LoongArch, or if the overhead changes here would show up in its benchmarks. |
Add LSX accelerated implementation for source file analysis This patch introduces an LSX-optimized version of `analyze_source_file` for the `loongarch64` target. Similar to existing SSE2 implementation for x86, this version: - Processes 16-byte chunks at a time using LSX vector intrinsics. - Quickly identifies newlines in ASCII-only chunks. - Falls back to the generic implementation when multi-byte UTF-8 characters are detected or in the tail portion.
Rollup of 9 pull requests Successful merges: - #143725 (core: add Peekable::next_if_map) - #145209 (Stabilize `path_add_extension`) - #145750 (raw_vec.rs: Remove superfluous fn alloc_guard) - #145962 (Ensure we emit an allocator shim when only some crate types need one) - #145963 (Add LSX accelerated implementation for source file analysis) - #146054 (add `#[must_use]` to `array::repeat`) - #146090 (Derive `PartialEq` for `InvisibleOrigin`) - #146120 (Correct typo in `rustc_errors` comment) - #146131 (rustdoc-search: add test case for indexing every item type) r? `@ghost` `@rustbot` modify labels: rollup
Add LSX accelerated implementation for source file analysis This patch introduces an LSX-optimized version of `analyze_source_file` for the `loongarch64` target. Similar to existing SSE2 implementation for x86, this version: - Processes 16-byte chunks at a time using LSX vector intrinsics. - Quickly identifies newlines in ASCII-only chunks. - Falls back to the generic implementation when multi-byte UTF-8 characters are detected or in the tail portion.
Rollup of 15 pull requests Successful merges: - #143725 (core: add Peekable::next_if_map) - #145209 (Stabilize `path_add_extension`) - #145750 (raw_vec.rs: Remove superfluous fn alloc_guard) - #145962 (Ensure we emit an allocator shim when only some crate types need one) - #145963 (Add LSX accelerated implementation for source file analysis) - #146054 (add `#[must_use]` to `array::repeat`) - #146090 (Derive `PartialEq` for `InvisibleOrigin`) - #146120 (Correct typo in `rustc_errors` comment) - #146127 (Rename `ToolRustc` to `ToolRustcPrivate`) - #146131 (rustdoc-search: add test case for indexing every item type) - #146133 (Revert "Make `lto` and `linker-plugin-lto` work the same for `compiler_builtins`) - #146134 (llvm: nvptx: Layout update to match LLVM) - #146136 (docs(std): add missing closing code block fences in doc comments) - #146137 (Disallow frontmatter in `--cfg` and `--check-cfg` arguments) - #146140 (compiletest: cygwin follows windows in using PATH for dynamic libraries) r? `@ghost` `@rustbot` modify labels: rollup
Add LSX accelerated implementation for source file analysis This patch introduces an LSX-optimized version of `analyze_source_file` for the `loongarch64` target. Similar to existing SSE2 implementation for x86, this version: - Processes 16-byte chunks at a time using LSX vector intrinsics. - Quickly identifies newlines in ASCII-only chunks. - Falls back to the generic implementation when multi-byte UTF-8 characters are detected or in the tail portion.
Rollup of 16 pull requests Successful merges: - #143725 (core: add Peekable::next_if_map) - #145209 (Stabilize `path_add_extension`) - #145342 (fix drop scope for `super let` bindings within `if let`) - #145750 (raw_vec.rs: Remove superfluous fn alloc_guard) - #145962 (Ensure we emit an allocator shim when only some crate types need one) - #145963 (Add LSX accelerated implementation for source file analysis) - #146054 (add `#[must_use]` to `array::repeat`) - #146090 (Derive `PartialEq` for `InvisibleOrigin`) - #146120 (Correct typo in `rustc_errors` comment) - #146127 (Rename `ToolRustc` to `ToolRustcPrivate`) - #146133 (Revert "Make `lto` and `linker-plugin-lto` work the same for `compiler_builtins`) - #146134 (llvm: nvptx: Layout update to match LLVM) - #146136 (docs(std): add missing closing code block fences in doc comments) - #146137 (Disallow frontmatter in `--cfg` and `--check-cfg` arguments) - #146140 (compiletest: cygwin follows windows in using PATH for dynamic libraries) - #146156 (miri subtree update) r? `@ghost` `@rustbot` modify labels: rollup
Add LSX accelerated implementation for source file analysis This patch introduces an LSX-optimized version of `analyze_source_file` for the `loongarch64` target. Similar to existing SSE2 implementation for x86, this version: - Processes 16-byte chunks at a time using LSX vector intrinsics. - Quickly identifies newlines in ASCII-only chunks. - Falls back to the generic implementation when multi-byte UTF-8 characters are detected or in the tail portion.
Rollup of 24 pull requests Successful merges: - #140459 (Add `read_buf` equivalents for positioned reads) - #143725 (core: add Peekable::next_if_map) - #145209 (Stabilize `path_add_extension`) - #145342 (fix drop scope for `super let` bindings within `if let`) - #145750 (raw_vec.rs: Remove superfluous fn alloc_guard) - #145827 (On unused binding or binding not present in all patterns, suggest potential typo of unit struct/variant or const) - #145932 (Allow `inline(always)` with a target feature behind a unstable feature `target_feature_inline_always`.) - #145962 (Ensure we emit an allocator shim when only some crate types need one) - #145963 (Add LSX accelerated implementation for source file analysis) - #146054 (add `#[must_use]` to `array::repeat`) - #146090 (Derive `PartialEq` for `InvisibleOrigin`) - #146112 (don't uppercase error messages) - #146120 (Correct typo in `rustc_errors` comment) - #146124 (Test `rustc-dev` in `distcheck`) - #146127 (Rename `ToolRustc` to `ToolRustcPrivate`) - #146131 (rustdoc-search: add test case for indexing every item type) - #146134 (llvm: nvptx: Layout update to match LLVM) - #146136 (docs(std): add missing closing code block fences in doc comments) - #146137 (Disallow frontmatter in `--cfg` and `--check-cfg` arguments) - #146140 (compiletest: cygwin follows windows in using PATH for dynamic libraries) - #146150 (fix(rustdoc): match rustc `--emit` precedence ) - #146155 (Make bootstrap self test parallel) - #146161 ([rustdoc] Uncomment code to add scraped rustdoc examples in loaded paths) - #146172 (triagebot: configure some pings when certain attributes are used) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of #145963 - heiher:src-analysis-lsx, r=lqd Add LSX accelerated implementation for source file analysis This patch introduces an LSX-optimized version of `analyze_source_file` for the `loongarch64` target. Similar to existing SSE2 implementation for x86, this version: - Processes 16-byte chunks at a time using LSX vector intrinsics. - Quickly identifies newlines in ASCII-only chunks. - Falls back to the generic implementation when multi-byte UTF-8 characters are detected or in the tail portion.
Rollup of 24 pull requests Successful merges: - rust-lang/rust#140459 (Add `read_buf` equivalents for positioned reads) - rust-lang/rust#143725 (core: add Peekable::next_if_map) - rust-lang/rust#145209 (Stabilize `path_add_extension`) - rust-lang/rust#145342 (fix drop scope for `super let` bindings within `if let`) - rust-lang/rust#145750 (raw_vec.rs: Remove superfluous fn alloc_guard) - rust-lang/rust#145827 (On unused binding or binding not present in all patterns, suggest potential typo of unit struct/variant or const) - rust-lang/rust#145932 (Allow `inline(always)` with a target feature behind a unstable feature `target_feature_inline_always`.) - rust-lang/rust#145962 (Ensure we emit an allocator shim when only some crate types need one) - rust-lang/rust#145963 (Add LSX accelerated implementation for source file analysis) - rust-lang/rust#146054 (add `#[must_use]` to `array::repeat`) - rust-lang/rust#146090 (Derive `PartialEq` for `InvisibleOrigin`) - rust-lang/rust#146112 (don't uppercase error messages) - rust-lang/rust#146120 (Correct typo in `rustc_errors` comment) - rust-lang/rust#146124 (Test `rustc-dev` in `distcheck`) - rust-lang/rust#146127 (Rename `ToolRustc` to `ToolRustcPrivate`) - rust-lang/rust#146131 (rustdoc-search: add test case for indexing every item type) - rust-lang/rust#146134 (llvm: nvptx: Layout update to match LLVM) - rust-lang/rust#146136 (docs(std): add missing closing code block fences in doc comments) - rust-lang/rust#146137 (Disallow frontmatter in `--cfg` and `--check-cfg` arguments) - rust-lang/rust#146140 (compiletest: cygwin follows windows in using PATH for dynamic libraries) - rust-lang/rust#146150 (fix(rustdoc): match rustc `--emit` precedence ) - rust-lang/rust#146155 (Make bootstrap self test parallel) - rust-lang/rust#146161 ([rustdoc] Uncomment code to add scraped rustdoc examples in loaded paths) - rust-lang/rust#146172 (triagebot: configure some pings when certain attributes are used) r? `@ghost` `@rustbot` modify labels: rollup
This change seems to cause compile errors for me with HEAD @ bea625f: error[E0308]: mismatched types
--> compiler/rustc_span/src/analyze_source_file.rs:189:60
|
189 | let multibyte_mask = lsx_vpickve2gr_w::<0>(multibyte_mask);
| --------------------- ^^^^^^^^^^^^^^ expected `v4i32`, found `v16i8`
| |
| arguments to this function are incorrect
|
note: function defined here
--> /rustc/788da80fcfcef3f34c90def5baa32813e39a1a41/library/core/src/../../stdarch/crates/core_arch/src/loongarch64/lsx/generated.rs:3825:8
error[E0308]: mismatched types
--> compiler/rustc_span/src/analyze_source_file.rs:198:67
|
198 | let mut newlines_mask = lsx_vpickve2gr_w::<0>(newlines_mask);
| --------------------- ^^^^^^^^^^^^^ expected `v4i32`, found `v16i8`
| |
| arguments to this function are incorrect
|
note: function defined here
--> /rustc/788da80fcfcef3f34c90def5baa32813e39a1a41/library/core/src/../../stdarch/crates/core_arch/src/loongarch64/lsx/generated.rs:3825:8 |
I’m puzzled why this issue didn’t get caught in CI. More precisely, this patch depends on the compiler after #145042. Both #145042 and the current patch have been merged into the 1.91.0 release line, which seems problematic. Should we delay this patch to the next release instead? |
I think I've figured out why this issue wasn't caught in CI: LoongArch64, being a cross-compilation target, isn't built with the stage0 compiler. I'm confident this is a real issue in native LoongArch64 builds, so I'll revert it until stage0 is bumped. Sorry for the noise. |
Revert "Add LSX accelerated implementation for source file analysis" This reverts commit 5b43244 to fix native build failures on LoongArch. Link: rust-lang#145963 (comment) Link: rust-lang#145963 (comment)
Rollup merge of #146290 - heiher:r-src-analysis-lsx, r=lqd Revert "Add LSX accelerated implementation for source file analysis" This reverts commit 5b43244 to fix native build failures on LoongArch. Link: #145963 (comment) Link: #145963 (comment)
Rollup of 24 pull requests Successful merges: - rust-lang/rust#140459 (Add `read_buf` equivalents for positioned reads) - rust-lang/rust#143725 (core: add Peekable::next_if_map) - rust-lang/rust#145209 (Stabilize `path_add_extension`) - rust-lang/rust#145342 (fix drop scope for `super let` bindings within `if let`) - rust-lang/rust#145750 (raw_vec.rs: Remove superfluous fn alloc_guard) - rust-lang/rust#145827 (On unused binding or binding not present in all patterns, suggest potential typo of unit struct/variant or const) - rust-lang/rust#145932 (Allow `inline(always)` with a target feature behind a unstable feature `target_feature_inline_always`.) - rust-lang/rust#145962 (Ensure we emit an allocator shim when only some crate types need one) - rust-lang/rust#145963 (Add LSX accelerated implementation for source file analysis) - rust-lang/rust#146054 (add `#[must_use]` to `array::repeat`) - rust-lang/rust#146090 (Derive `PartialEq` for `InvisibleOrigin`) - rust-lang/rust#146112 (don't uppercase error messages) - rust-lang/rust#146120 (Correct typo in `rustc_errors` comment) - rust-lang/rust#146124 (Test `rustc-dev` in `distcheck`) - rust-lang/rust#146127 (Rename `ToolRustc` to `ToolRustcPrivate`) - rust-lang/rust#146131 (rustdoc-search: add test case for indexing every item type) - rust-lang/rust#146134 (llvm: nvptx: Layout update to match LLVM) - rust-lang/rust#146136 (docs(std): add missing closing code block fences in doc comments) - rust-lang/rust#146137 (Disallow frontmatter in `--cfg` and `--check-cfg` arguments) - rust-lang/rust#146140 (compiletest: cygwin follows windows in using PATH for dynamic libraries) - rust-lang/rust#146150 (fix(rustdoc): match rustc `--emit` precedence ) - rust-lang/rust#146155 (Make bootstrap self test parallel) - rust-lang/rust#146161 ([rustdoc] Uncomment code to add scraped rustdoc examples in loaded paths) - rust-lang/rust#146172 (triagebot: configure some pings when certain attributes are used) r? `@ghost` `@rustbot` modify labels: rollup
Rollup of 24 pull requests Successful merges: - rust-lang#140459 (Add `read_buf` equivalents for positioned reads) - rust-lang#143725 (core: add Peekable::next_if_map) - rust-lang#145209 (Stabilize `path_add_extension`) - rust-lang#145342 (fix drop scope for `super let` bindings within `if let`) - rust-lang#145750 (raw_vec.rs: Remove superfluous fn alloc_guard) - rust-lang#145827 (On unused binding or binding not present in all patterns, suggest potential typo of unit struct/variant or const) - rust-lang#145932 (Allow `inline(always)` with a target feature behind a unstable feature `target_feature_inline_always`.) - rust-lang#145962 (Ensure we emit an allocator shim when only some crate types need one) - rust-lang#145963 (Add LSX accelerated implementation for source file analysis) - rust-lang#146054 (add `#[must_use]` to `array::repeat`) - rust-lang#146090 (Derive `PartialEq` for `InvisibleOrigin`) - rust-lang#146112 (don't uppercase error messages) - rust-lang#146120 (Correct typo in `rustc_errors` comment) - rust-lang#146124 (Test `rustc-dev` in `distcheck`) - rust-lang#146127 (Rename `ToolRustc` to `ToolRustcPrivate`) - rust-lang#146131 (rustdoc-search: add test case for indexing every item type) - rust-lang#146134 (llvm: nvptx: Layout update to match LLVM) - rust-lang#146136 (docs(std): add missing closing code block fences in doc comments) - rust-lang#146137 (Disallow frontmatter in `--cfg` and `--check-cfg` arguments) - rust-lang#146140 (compiletest: cygwin follows windows in using PATH for dynamic libraries) - rust-lang#146150 (fix(rustdoc): match rustc `--emit` precedence ) - rust-lang#146155 (Make bootstrap self test parallel) - rust-lang#146161 ([rustdoc] Uncomment code to add scraped rustdoc examples in loaded paths) - rust-lang#146172 (triagebot: configure some pings when certain attributes are used) r? `@ghost` `@rustbot` modify labels: rollup
Reland "Add LSX accelerated implementation for source file analysis" This patch introduces an LSX-optimized version of `analyze_source_file` for the `loongarch64` target. Similar to existing SSE2 implementation for x86, this version: - Processes 16-byte chunks at a time using LSX vector intrinsics. - Quickly identifies newlines in ASCII-only chunks. - Falls back to the generic implementation when multi-byte UTF-8 characters are detected or in the tail portion. Reland rust-lang#145963 r? `@lqd`
Reland "Add LSX accelerated implementation for source file analysis" This patch introduces an LSX-optimized version of `analyze_source_file` for the `loongarch64` target. Similar to existing SSE2 implementation for x86, this version: - Processes 16-byte chunks at a time using LSX vector intrinsics. - Quickly identifies newlines in ASCII-only chunks. - Falls back to the generic implementation when multi-byte UTF-8 characters are detected or in the tail portion. Reland rust-lang#145963 r? ``@lqd``
Rollup merge of #147113 - heiher:src-analysis-lsx, r=lqd Reland "Add LSX accelerated implementation for source file analysis" This patch introduces an LSX-optimized version of `analyze_source_file` for the `loongarch64` target. Similar to existing SSE2 implementation for x86, this version: - Processes 16-byte chunks at a time using LSX vector intrinsics. - Quickly identifies newlines in ASCII-only chunks. - Falls back to the generic implementation when multi-byte UTF-8 characters are detected or in the tail portion. Reland #145963 r? ``@lqd``
This patch introduces an LSX-optimized version of
analyze_source_file
for theloongarch64
target. Similar to existing SSE2 implementation for x86, this version: