Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Arm64: Use carry instructions for 128bit arithmetic #113070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
a74nh opened this issue Mar 3, 2025 · 4 comments
Open

Arm64: Use carry instructions for 128bit arithmetic #113070

a74nh opened this issue Mar 3, 2025 · 4 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@a74nh
Copy link
Contributor

a74nh commented Mar 3, 2025

Consider the addition operator for Int128:

public static Int128 operator +(Int128 left, Int128 right)

        /// <inheritdoc cref="IAdditionOperators{TSelf, TOther, TResult}.op_Addition(TSelf, TOther)" />
        public static Int128 operator +(Int128 left, Int128 right)
        {
            // For unsigned addition, we can detect overflow by checking `(x + y) < x`
            // This gives us the carry to add to upper to compute the correct result

            ulong lower = left._lower + right._lower;
            ulong carry = (lower < left._lower) ? 1UL : 0UL;

            ulong upper = left._upper + right._upper + carry;
            return new Int128(upper, lower);
        }

This compiles to:

ldp     x0, x1, [fp, #0x18]	// [V42 tmp41], [V43 tmp42]
ldp     x20, x21, [fp, #0x28]	// [V34 tmp33], [V35 tmp34]
add     x19, x20, x0
cmp     x19, x20
cset    x0, lo
mov     w0, w0
add     x1, x21, x1
add     x20, x1, x0
movz    x0, #0xA610
movk    x0, #0x75F3 LSL #16
movk    x0, #0xECE6 LSL #32
bl      CORINFO_HELP_NEWSFAST
stp     x19, x20, [x0, #0x08]

Instead add with carry should be used to remove the compare:

ldp     x0, x1, [fp, #0x18]	// [V42 tmp41], [V43 tmp42]
ldp     x20, x21, [fp, #0x28]	// [V34 tmp33], [V35 tmp34]
adds    x19, x20, x0
addc    x20, x21, x1
movz    x0, #0xA610
movk    x0, #0x75F3 LSL #16
movk    x0, #0xECE6 LSL #32
bl      CORINFO_HELP_NEWSFAST
stp     x19, x20, [x0, #0x08]

This should be expanded for uses of SUBC and NEGC.

This could either be done by matching a standard IR pattern or if it's too big/fragile, then via intrinsics. There is an initial discussion here: #68028 (comment)

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 3, 2025
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 3, 2025
@huoyaoyuan
Copy link
Member

Also #48247 and #80674.

If arm uses state/flags register for addc too, it should be under the same challenge of xarch.

@a74nh
Copy link
Contributor Author

a74nh commented Mar 3, 2025

Also #48247 and #80674.

If arm uses state/flags register for addc too, it should be under the same challenge of xarch.

I should have done a better search before raising. Yes, those are essentially the same, but xarch specific.

@vcsjones vcsjones added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Mar 3, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Mar 6, 2025
@JulieLeeMSFT JulieLeeMSFT added this to the 10.0.0 milestone Mar 6, 2025
@Daniel-Svensson
Copy link
Contributor

I like the concept behind #48247 (comment) in that it is platform neutral and easy to discover.

However, i do not know if using a tuple or bool to represent carry may affect how easy/difficult it is to get good machine code.

As long as the end goal is to make it easy to get performance code on all architectures. It sounds fine.

There is also #76502 which reference prior work at recognizing the pattern

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

5 participants