Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@monojenkins
Copy link
Contributor

!! This PR is a copy of dotnet/runtime#47362, please do not edit or review it in this repo !!
Do not automatically approve this PR:

* Consider how the changes affect configurations in this repo,
* Check effects on files that are not mirrored,
* Identify test cases that may be needed in this repo.

!! Merge the PR only after the original PR is merged !!



Closes dotnet/runtime#43106

In addition to implementing the intrinsics I have updated System.Math:BigMul(long,long,byref):long implementation in System.Private.CoreLib. The following is the codegen of the methods:

; Assembly listing for method System.Math:BigMul(long,long,byref):long
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  4,  4   )    long  ->   x0        
;  V01 arg1         [V01,T01] (  4,  4   )    long  ->   x1        
;  V02 arg2         [V02,T02] (  3,  3   )   byref  ->   x2        
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M18264_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50
G_M18264_IG02:              ;; offset=0008H
        9B017C03          mul     x3, x0, x1
        F9000043          str     x3, [x2]
        9BC17C00          umulh   x0, x0, x1
						;; bbWeight=1    PerfScore 8.00
G_M18264_IG03:              ;; offset=0014H
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00

; Total bytes of code 28, prolog size 8, PerfScore 14.30, instruction count 7, allocated bytes for code 28 (MethodHash=96edb8a7) for method System.Math:BigMul(long,long,byref):long
; ============================================================

; Assembly listing for method System.Math:BigMul(long,long,byref):long
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  4,  4   )    long  ->   x0        
;  V01 arg1         [V01,T01] (  4,  4   )    long  ->   x1        
;  V02 arg2         [V02,T02] (  3,  3   )   byref  ->   x2        
;* V03 loc0         [V03    ] (  0,  0   )    long  ->  zero-ref   
;* V04 loc1         [V04    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op
;# V05 OutArgs      [V05    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M18264_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50
G_M18264_IG02:              ;; offset=0008H
        9B017C03          mul     x3, x0, x1
        F9000043          str     x3, [x2]
        9B417C00          smulh   x0, x0, x1
						;; bbWeight=1    PerfScore 8.00
G_M18264_IG03:              ;; offset=0014H
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00

; Total bytes of code 28, prolog size 8, PerfScore 14.30, instruction count 7, allocated bytes for code 28 (MethodHash=96edb8a7) for method System.Math:BigMul(long,long,byref):long
; ============================================================

Closes dotnet/runtime#43106

In addition to implementing the intrinsics I have updated `System.Math:BigMul(long,long,byref):long` implementation in System.Private.CoreLib. The following is the codegen of the methods:
```asm
; Assembly listing for method System.Math:BigMul(long,long,byref):long
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  4,  4   )    long  ->   x0
;  V01 arg1         [V01,T01] (  4,  4   )    long  ->   x1
;  V02 arg2         [V02,T02] (  3,  3   )   byref  ->   x2
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M18264_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50
G_M18264_IG02:              ;; offset=0008H
        9B017C03          mul     x3, x0, x1
        F9000043          str     x3, [x2]
        9BC17C00          umulh   x0, x0, x1
						;; bbWeight=1    PerfScore 8.00
G_M18264_IG03:              ;; offset=0014H
        A8C17BFD          ldp     fp, lr, [sp],mono#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00

; Total bytes of code 28, prolog size 8, PerfScore 14.30, instruction count 7, allocated bytes for code 28 (MethodHash=96edb8a7) for method System.Math:BigMul(long,long,byref):long
; ============================================================

; Assembly listing for method System.Math:BigMul(long,long,byref):long
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  4,  4   )    long  ->   x0
;  V01 arg1         [V01,T01] (  4,  4   )    long  ->   x1
;  V02 arg2         [V02,T02] (  3,  3   )   byref  ->   x2
;* V03 loc0         [V03    ] (  0,  0   )    long  ->  zero-ref
;* V04 loc1         [V04    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op
;# V05 OutArgs      [V05    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M18264_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50
G_M18264_IG02:              ;; offset=0008H
        9B017C03          mul     x3, x0, x1
        F9000043          str     x3, [x2]
        9B417C00          smulh   x0, x0, x1
						;; bbWeight=1    PerfScore 8.00
G_M18264_IG03:              ;; offset=0014H
        A8C17BFD          ldp     fp, lr, [sp],mono#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00

; Total bytes of code 28, prolog size 8, PerfScore 14.30, instruction count 7, allocated bytes for code 28 (MethodHash=96edb8a7) for method System.Math:BigMul(long,long,byref):long
; ============================================================
```
@monojenkins monojenkins force-pushed the sync-pr-47362-from-runtime branch from 278106e to 211da55 Compare January 27, 2021 19:50
@imhameed
Copy link
Contributor

@monojenkins build failed

@imhameed
Copy link
Contributor

@monojenkins build failed

@imhameed imhameed merged commit 5f49dca into mono:master Jan 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Arm64] MultiplyHigh

2 participants