Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Wrong Answer after LoopVectorize on Arm #156190

@ashermancinelli

Description

@ashermancinelli

Found in this Fortran test case. We find incorrect answers after loop-vectorize and only see this on aarch64. This is the LLVM IR before and after loop vectorize, with the dependency on the Fortran runtime removed:

This is the expected output of the program alongside the output after LV:

Good:
v[90][90] = 40.000000 + 0.000000i
y[90][90] = 21.000000 + 0.000000i

Bad:
v[90][90] = 40.000000 + 40.000000i ;; << real component sum is written to both
y[90][90] = 21.000000 + 0.000000i

You can see the real component is written to both the real and imaginary component of this array. If we replace the two instructions responsible for computing the imaginary component, the test yields the expected output again:

139,141c139,140
<   %broadcast.splatinsert13 = insertelement <2 x double> poison, double %47, i64 0, !dbg !8
<   %broadcast.splat14 = shufflevector <2 x double> %broadcast.splatinsert13, <2 x double> poison, <2 x i32> zeroinitializer, !dbg !8
<   %48 = fadd contract <2 x double> %broadcast.splat12, %broadcast.splat14, !dbg !8
---
>   %wide.load.x = load <2 x double>, ptr %46, align 8, !dbg !8
>   %48 = fadd contract <2 x double> %wide.load, %wide.load.x, !dbg !8

This godbolt link has the corrected IR:

  ; this looks wrong
  %47 = load double, ptr %46, align 8, !dbg !8
  %broadcast.splatinsert13 = insertelement <2 x double> poison, double %47, i64 0, !dbg !8
  %broadcast.splat14 = shufflevector <2 x double> %broadcast.splatinsert13, <2 x double> poison, <2 x i32> zeroinitializer, !dbg !8
  %48 = fadd contract <2 x double> %broadcast.splat12, %broadcast.splat14, !dbg !8

  ; Hand-edits which give expected results
  ;%wide.load.x = load <2 x double>, ptr %46, align 8, !dbg !8
  ;%48 = fadd contract <2 x double> %wide.load, %wide.load.x, !dbg !8

Note that if we compile the exact same IR pre-LV for a generic x86 architecture we do not see the same behavior:

> opt /home/amancinelli/tmp.iJQ3cjnSSc-before-pass-316.ll -passes=loop-vectorize -S -o /tmp/t.ll && clang /tmp/t.ll && ./a.out
opt: WARNING: failed to create target machine for 'aarch64-unknown-linux-gnu': unable to get target for 'aarch64-unknown-linux-gnu', see --version and --triple.
warning: overriding the module target triple with x86_64-unknown-linux-gnu
      [-Woverride-module]
1 warning generated.
v[90][90] = 40.000000 + 0.000000i
y[90][90] = 21.000000 + 0.000000i

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    Status

    Needs Triage

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions