Tags: vosen/ZLUDA
Tags
Implement extended precision integer addition (#607) * Refactor emit_intrinsic to allow struct return type * Implement extended precision integer addition Uses `llvm.*add.with.overflow.*`. That intrinsic does not take a carry argument, so handling carry in requires multiple additions and combining the carry out, but the AMDGPU target is able to translate that pattern into a single instruction. These four PTX instructions: ``` add.cc.u32 r0, a0, b0; addc.cc.u32 r1, a1, b1; addc.cc.u32 r2, a2, b2; addc.u32 r3, a3, b3; ``` are translated into four RDNA3 instructions: ``` v_add_co_u32 v0, vcc_lo, v0, v4 v_add_co_ci_u32_e32 v1, vcc_lo, v1, v5, vcc_lo v_add_co_ci_u32_e32 v2, vcc_lo, v2, v6, vcc_lo v_add_co_ci_u32_e32 v3, vcc_lo, v7, v3, vcc_lo ``` * cargo fmt * Rename to match convention * cargo fmt
PreviousNext