-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Merged stores: Fix alignment-related issues and enable SIMD where possible #92939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsMerge e.g. two consecutive SIMD stores (e.g. 2x Vector256 into 1x Vector512). But I am still trying to build a mental model for the case with "multiple scalar stores -> SIMD store" (we currently don't do it).
* - only if target (e.g. struct) is known not to contain GC handles So far, it seems that x86/AMD64 doesn't offer any kind of guarantee for atomicity officially (even per component).
Related issues: #76503, #51638,
|
4250c12
to
1a31d1a
Compare
Note this is from |
|
Co-authored-by: SingleAccretion <[email protected]>
@jakobbotsch @dotnet/jit-contrib PTAL, Diffs (regression as expected because it made the whole #92852 algorithm more conservative, but the initial diffs were -400kb so most wins are expected to remain, obviously, most base addresses are TYP_REF like Jakob predicted). Wins on ARM64 due better SIMD guarantees. |
Improved Diffs on arm64 |
seems there are more regressions on linux/windows x64. Do we know why? |
|
these are reverted improvements from #92852 because they turned out to be not legal (but fortunately, most improvements remained) |
x86 SPMI jobs failed with timeout/"no space left", I'll check other runs |
Adjust rules when we can use unaligned stores for merged ones. Also, enable 2xLONG/REF -> SIMD. And 2xSIMD to wider SIMD.
Wider scalar primitives for naturally aligned data of primitives (>1B):
boundary?
SIMD for for naturally aligned data of primitives (>1B):
* both Intel and AMD
** it's very unlikely JIT can assume 16-byte alignment currently anyhow
PS: Merged stores are conservatively disabled on LA64 and RISC-V
Per "Arm Architecture Reference Manual":
@tannergooding said that x64 with AVX promises atomicy for 16B for 16B aligned data - so far it seems to be the only thing x64 can guarantee to us.
Related issues: #76503, #51638,