[X86][APX] Fix issues of suppressing APX for relocation #139285

fzou1 · 2025-05-09T16:23:31Z

There is ADD64rm_ND instruction emitted with GOTPCREL relocation. Handled it in "Suppress APX for relocation" pass and transformed it to ADD64rm with register operand in non-rex2 register class. The relocation type R_X86_64_CODE_6_GOTPCRELX will be added later for APX enabled with relocation.
The register class for operands in instruction with relocation is updated to non-rex2 one in "Suppress APX for relocation" pass, but it may be updated/recomputed to larger register class (like GR64_NOREX2RegClass to GR64RegClass). Fixed by not updating the register class if it's non-rex2 register class and APX support for relocation is disabled.
After "Suppress APX for relocation" pass, the instruction with relocation may be folded with add NDD instruction to a add NDD instruction with relocation. The later will be emitted to instruction with APX relocation type which breaks backward compatibility. Fixed by not folding instruction with GOTPCREL relocation with NDD instruction.
If the register in operand 0 of instruction with relocation is used in the PHI instruction, it may be replaced with operand 0 of PHI instruction (maybe EGPR) after PHI elimination and Machine Copy Propagation pass. Fixed by suppressing EGPR in operand 0 of PHI instruction to avoid APX relocation types emitted.

1. There is ADD64rm_ND instruction emitted with GOTPCREL relocation. Handled it in "Suppress APX for relocation" pass and transformed it to ADD64rm with register operand in non-rex2 register class. The relocation type R_X86_64_CODE_6_GOTPCRELX will be added later for APX enabled with relocation. 2. The register class for operands in instruction with relocation is updated to non-rex2 one in "Suppress APX for relocation" pass, but it may be updated/recomputed to larger register class (like GR64_NOREX2RegClass to GR64RegClass). Fixed by not updating the register class if it's non-rex2 register class and APX support for relocation is disabled. 3. After "Suppress APX for relocation" pass, the instruction with relocation may be folded with add NDD instruction to a add NDD instruction with relocation. The later will be emitted to instruction with APX relocation type which breaks backward compatibility. Fixed by not folding instruction with GOTPCREL relocation with NDD instruction.

llvmbot · 2025-05-09T16:24:05Z

@llvm/pr-subscribers-backend-x86

Author: Feng Zou (fzou1)

Changes

There is ADD64rm_ND instruction emitted with GOTPCREL relocation. Handled it in "Suppress APX for relocation" pass and transformed it to ADD64rm with register operand in non-rex2 register class. The relocation type R_X86_64_CODE_6_GOTPCRELX will be added later for APX enabled with relocation.
The register class for operands in instruction with relocation is updated to non-rex2 one in "Suppress APX for relocation" pass, but it may be updated/recomputed to larger register class (like GR64_NOREX2RegClass to GR64RegClass). Fixed by not updating the register class if it's non-rex2 register class and APX support for relocation is disabled.
After "Suppress APX for relocation" pass, the instruction with relocation may be folded with add NDD instruction to a add NDD instruction with relocation. The later will be emitted to instruction with APX relocation type which breaks backward compatibility. Fixed by not folding instruction with GOTPCREL relocation with NDD instruction.

Full diff: https://github.com/llvm/llvm-project/pull/139285.diff

7 Files Affected:

(modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+8)
(modified) llvm/lib/Target/X86/X86InstrInfo.h (+34)
(modified) llvm/lib/Target/X86/X86RegisterInfo.cpp (+22)
(modified) llvm/lib/Target/X86/X86RegisterInfo.h (+2)
(modified) llvm/lib/Target/X86/X86SuppressAPXForReloc.cpp (+2-1)
(added) llvm/test/CodeGen/X86/apx/reloc-regclass.ll (+187)
(modified) llvm/test/CodeGen/X86/apx/reloc.mir (+28-2)

diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 5220ae2e67bb6..963a2bb84e185 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -8122,6 +8122,14 @@ MachineInstr *X86InstrInfo::foldMemoryOperandImpl(
        shouldPreventUndefRegUpdateMemFold(MF, MI)))
     return nullptr;
 
+  // Do not fold a NDD instruction and a memory instruction with relocation to
+  // avoid emit APX relocation when the flag is disabled for backward
+  // compatibility.
+  uint64_t TSFlags = MI.getDesc().TSFlags;
+  if (!X86EnableAPXForRelocation && isMemInstrWithGOTPCREL(LoadMI) &&
+      X86II::hasNewDataDest(TSFlags))
+    return nullptr;
+
   // Determine the alignment of the load.
   Align Alignment;
   unsigned LoadOpc = LoadMI.getOpcode();
diff --git a/llvm/lib/Target/X86/X86InstrInfo.h b/llvm/lib/Target/X86/X86InstrInfo.h
index 2a9f567689ecb..e53f2566dd892 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.h
+++ b/llvm/lib/Target/X86/X86InstrInfo.h
@@ -187,6 +187,40 @@ inline static bool isAddMemInstrWithRelocation(const MachineInstr &MI) {
   return false;
 }
 
+inline static bool isMemInstrWithGOTPCREL(const MachineInstr &MI) {
+  unsigned Op = MI.getOpcode();
+  switch (Op) {
+  case X86::TEST32mr:
+  case X86::TEST64mr:
+  case X86::CMP32rm:
+  case X86::CMP64rm:
+  case X86::MOV32rm:
+  case X86::MOV64rm:
+  case X86::ADC32rm:
+  case X86::ADD32rm:
+  case X86::AND32rm:
+  case X86::OR32rm:
+  case X86::SBB32rm:
+  case X86::SUB32rm:
+  case X86::XOR32rm:
+  case X86::ADC64rm:
+  case X86::ADD64rm:
+  case X86::AND64rm:
+  case X86::OR64rm:
+  case X86::SBB64rm:
+  case X86::SUB64rm:
+  case X86::XOR64rm: {
+    int MemOpNo = X86II::getMemoryOperandNo(MI.getDesc().TSFlags) +
+                  X86II::getOperandBias(MI.getDesc());
+    const MachineOperand &MO = MI.getOperand(X86::AddrDisp + MemOpNo);
+    if (MO.getTargetFlags() == X86II::MO_GOTPCREL)
+      return true;
+    break;
+  }
+  }
+  return false;
+}
+
 class X86InstrInfo final : public X86GenInstrInfo {
   X86Subtarget &Subtarget;
   const X86RegisterInfo RI;
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.cpp b/llvm/lib/Target/X86/X86RegisterInfo.cpp
index ef58c7619b243..c192e8892995b 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.cpp
+++ b/llvm/lib/Target/X86/X86RegisterInfo.cpp
@@ -50,6 +50,8 @@ static cl::opt<bool>
                             cl::desc("Disable two address hints for register "
                                      "allocation"));
 
+extern cl::opt<bool> X86EnableAPXForRelocation;
+
 X86RegisterInfo::X86RegisterInfo(const Triple &TT)
     : X86GenRegisterInfo((TT.isArch64Bit() ? X86::RIP : X86::EIP),
                          X86_MC::getDwarfRegFlavour(TT, false),
@@ -121,6 +123,11 @@ X86RegisterInfo::getLargestLegalSuperClass(const TargetRegisterClass *RC,
   if (RC == &X86::GR8_NOREXRegClass)
     return RC;
 
+  // Keep using non-rex2 register class when APX feature (EGPR/NDD/NF) is not
+  // enabled for relocation.
+  if (!X86EnableAPXForRelocation && isNonRex2RegClass(RC))
+    return RC;
+
   const X86Subtarget &Subtarget = MF.getSubtarget<X86Subtarget>();
 
   const TargetRegisterClass *Super = RC;
@@ -1258,3 +1265,18 @@ const TargetRegisterClass *X86RegisterInfo::constrainRegClassToNonRex2(
     return &X86::GR64_NOREX2_NOSPRegClass;
   }
 }
+
+bool X86RegisterInfo::isNonRex2RegClass(const TargetRegisterClass *RC) const {
+  switch (RC->getID()) {
+  default:
+    return false;
+  case X86::GR8_NOREX2RegClassID:
+  case X86::GR16_NOREX2RegClassID:
+  case X86::GR32_NOREX2RegClassID:
+  case X86::GR64_NOREX2RegClassID:
+  case X86::GR32_NOREX2_NOSPRegClassID:
+  case X86::GR64_NOREX2_NOSPRegClassID:
+  case X86::GR64_with_sub_16bit_in_GR16_NOREX2RegClassID:
+    return true;
+  }
+}
\ No newline at end of file
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.h b/llvm/lib/Target/X86/X86RegisterInfo.h
index 13a5fbf16e981..19b409ae619d2 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.h
+++ b/llvm/lib/Target/X86/X86RegisterInfo.h
@@ -178,6 +178,8 @@ class X86RegisterInfo final : public X86GenRegisterInfo {
 
   const TargetRegisterClass *
   constrainRegClassToNonRex2(const TargetRegisterClass *RC) const;
+
+  bool isNonRex2RegClass(const TargetRegisterClass *RC) const;
 };
 
 } // End llvm namespace
diff --git a/llvm/lib/Target/X86/X86SuppressAPXForReloc.cpp b/llvm/lib/Target/X86/X86SuppressAPXForReloc.cpp
index d40995cb1786d..a263fd39bc324 100644
--- a/llvm/lib/Target/X86/X86SuppressAPXForReloc.cpp
+++ b/llvm/lib/Target/X86/X86SuppressAPXForReloc.cpp
@@ -167,7 +167,8 @@ static bool handleNDDOrNFInstructions(MachineFunction &MF,
         int MemOpNo = X86II::getMemoryOperandNo(MI.getDesc().TSFlags) +
                       X86II::getOperandBias(MI.getDesc());
         const MachineOperand &MO = MI.getOperand(X86::AddrDisp + MemOpNo);
-        if (MO.getTargetFlags() == X86II::MO_GOTTPOFF) {
+        if (MO.getTargetFlags() == X86II::MO_GOTTPOFF ||
+            MO.getTargetFlags() == X86II::MO_GOTPCREL) {
           LLVM_DEBUG(dbgs() << "Transform instruction with relocation type:\n  "
                             << MI);
           Register Reg = MRI->createVirtualRegister(&X86::GR64_NOREX2RegClass);
diff --git a/llvm/test/CodeGen/X86/apx/reloc-regclass.ll b/llvm/test/CodeGen/X86/apx/reloc-regclass.ll
new file mode 100644
index 0000000000000..685a64cee2b47
--- /dev/null
+++ b/llvm/test/CodeGen/X86/apx/reloc-regclass.ll
@@ -0,0 +1,187 @@
+; RUN: llc -mcpu=diamondrapids %s -mtriple=x86_64 -filetype=obj -o %t.o
+; RUN: llvm-objdump --no-print-imm-hex -dr %t.o | FileCheck %s --check-prefixes=NOAPXREL,CHECK
+
+; RUN: llc -mcpu=diamondrapids %s -mtriple=x86_64 -filetype=obj -o %t.o -x86-enable-apx-for-relocation=true
+; RUN: llvm-objdump --no-print-imm-hex -dr %t.o | FileCheck %s --check-prefixes=APXREL,CHECK
+
+
+; The first 2 tests are used to check if the register class is not
+; updated/recomputed by register allocator. It's originally updated to non-rex2
+; register class by "Suppress APX for relocation" pass.
+
+
+; CHECK-LABEL: test_regclass_not_updated_by_regalloc_1
+; APXREL: movq    (%rip), %r16
+; APXREL-NEXT: R_X86_64_CODE_4_GOTPCRELX gvar-0x4
+; NOAPXREL: movq    (%rip), %rdi
+; NOAPXREL-NEXT: R_X86_64_REX_GOTPCRELX gvar-0x4
+; NOAPXREL-NOT: R_X86_64_CODE_4_GOTPCRELX
+
+@gvar = external global [20000 x i8]
+
+define void @test_regclass_not_updated_by_regalloc_1(ptr %ptr1, ptr %0, i32 %int1, i64 %int_sext, i64 %mul.447, i64 %int_sext3, i32 %fetch.2508, i32 %fetch.2513, i32 %mul.442, i64 %int_sext6, i64 %int_sext7, i64 %int_sext8, i1 %cond1, i1 %cond2) {
+alloca_38:
+  %int_sext4 = sext i32 %int1 to i64
+  tail call void @llvm.memset.p0.i64(ptr @gvar, i8 0, i64 20000, i1 false)
+  %div.161 = sdiv i64 %int_sext3, %int_sext
+  %cmp.2 = icmp sgt i64 %div.161, 0
+  %1 = sub i64 %int_sext7, %mul.447
+  br label %loop.41
+
+loop.41:                                          ; preds = %ifmerge.2, %alloca_38
+  br i1 %cmp.2, label %L.53, label %ifmerge.2
+
+L.53:                                         ; preds = %loop.41
+  %2 = getelementptr i8, ptr %ptr1, i64 %int_sext8
+  br label %loop.83
+
+loop.83:                                          ; preds = %loop.83, %L.53
+  %i2.i64.1 = phi i64 [ 0, %L.53 ], [ %nextloop.83, %loop.83 ]
+  %3 = mul i64 %i2.i64.1, %int_sext4
+  %.r275 = add i64 %3, %1
+  %4 = getelementptr float, ptr getelementptr ([20000 x i8], ptr @gvar, i64 0, i64 8000), i64 %.r275
+  %gepload = load float, ptr %2, align 1
+  store float %gepload, ptr %4, align 4
+  %nextloop.83 = add i64 %i2.i64.1, 1
+  br i1 %cond1, label %ifmerge.2, label %loop.83
+
+ifmerge.2:                                        ; preds = %loop.83, %loop.41
+  br i1 %cond2, label %afterloop.41, label %loop.41
+
+afterloop.41:                                     ; preds = %ifmerge.2
+  %mul.469 = mul i32 %mul.442, %fetch.2508
+  %div.172 = mul i32 %fetch.2513, %mul.469
+  %mul.471 = mul i32 %int1, %div.172
+  %int_sext39 = sext i32 %mul.471 to i64
+  %5 = mul i64 %int_sext6, %int_sext39
+  %6 = getelementptr i8, ptr %ptr1, i64 %5
+  %7 = load float, ptr %6, align 1
+  store float %7, ptr null, align 4
+  ret void
+}
+
+declare void @llvm.memset.p0.i64(ptr writeonly captures(none), i8, i64, i1 immarg)
+
+; Will update after R_X86_64_CODE_6_GOTPCRELX is supported.
+; CHECK-LABEL: test_regclass_not_updated_by_regalloc_2
+; APXREL: {nf} addq (%rip), %r16, %rcx
+; APXREL-NEXT: R_X86_64_GOTPCREL gvar2-0x4
+; NOAPXREL: addq    (%rip), %rbx
+; NOAPXREL-NEXT: R_X86_64_REX_GOTPCRELX gvar2-0x4
+; NOAPXREL-NOT: R_X86_64_CODE_4_GOTPCRELX
+
+@gvar2 = external constant [8 x [8 x i32]]
+
+define void @test_regclass_not_updated_by_regalloc_2(ptr %pSrc1, i32 %srcStep1, ptr %pSrc2, i32 %srcStep2, i32 %width, i32 %0, i1 %cmp71.not783, i1 %cmp11.i, ptr %pSrc2.addr.0535.i) {
+entry:
+  %1 = ashr i32 %srcStep2, 1
+  %conv.i = sext i32 %width to i64
+  %conv6.i = and i32 %srcStep1, 1
+  %cmp.i = icmp sgt i32 %srcStep1, 0
+  %idx.ext.i = zext i32 %conv6.i to i64
+  %2 = getelementptr <4 x i64>, ptr @gvar2, i64 %idx.ext.i
+  %idx.ext183.i = sext i32 %1 to i64
+  br i1 %cmp71.not783, label %for.end, label %for.body73.lr.ph
+
+for.body73.lr.ph:                                 ; preds = %entry
+  %3 = load <4 x i64>, ptr %2, align 32
+  %..i = select i1 %cmp11.i, <4 x i64> zeroinitializer, <4 x i64> splat (i64 1)
+  %4 = bitcast <4 x i64> %..i to <8 x i32>
+  %5 = bitcast <4 x i64> %3 to <8 x i32>
+  %. = select i1 %cmp.i, <8 x i32> splat (i32 1), <8 x i32> %4
+  %.833 = select i1 %cmp.i, <8 x i32> %5, <8 x i32> zeroinitializer
+  br i1 %cmp11.i, label %for.end.i, label %for.end
+
+for.end.i:                                        ; preds = %if.end153.i, %for.body73.lr.ph
+  %pSrc2.addr.0535.i5 = phi ptr [ %add.ptr184.i, %if.end153.i ], [ %pSrc2, %for.body73.lr.ph ]
+  %eSum0.0531.i = phi <4 x i64> [ %add.i452.i, %if.end153.i ], [ zeroinitializer, %for.body73.lr.ph ]
+  br i1 %cmp71.not783, label %if.end153.i, label %if.then90.i
+
+if.then90.i:                                      ; preds = %for.end.i
+  %6 = tail call <8 x i32> @llvm.x86.avx2.maskload.d.256(ptr null, <8 x i32> %.)
+  %add.i464.i = or <4 x i64> %eSum0.0531.i, zeroinitializer
+  %7 = bitcast <8 x i32> %.833 to <4 x i64>
+  %add.ptr152.i = getelementptr i16, ptr %pSrc2.addr.0535.i5, i64 %conv.i
+  br label %if.end153.i
+
+if.end153.i:                                      ; preds = %if.then90.i, %for.end.i
+  %eSum0.2.i = phi <4 x i64> [ %7, %if.then90.i ], [ %eSum0.0531.i, %for.end.i ]
+  %pLocSrc2.1.i = phi ptr [ %add.ptr152.i, %if.then90.i ], [ %pSrc1, %for.end.i ]
+  %8 = load i16, ptr %pLocSrc2.1.i, align 2
+  %conv165.i = zext i16 %8 to i32
+  %vecinit3.i.i = insertelement <4 x i32> zeroinitializer, i32 %conv165.i, i64 0
+  %9 = bitcast <4 x i32> %vecinit3.i.i to <2 x i64>
+  %shuffle.i503.i = shufflevector <2 x i64> %9, <2 x i64> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+  %add.i452.i = or <4 x i64> %eSum0.2.i, %shuffle.i503.i
+  %add.ptr184.i = getelementptr i16, ptr %pSrc2.addr.0535.i, i64 %idx.ext183.i
+  br label %for.end.i
+
+for.end:                                          ; preds = %for.body73.lr.ph, %entry
+  br label %for.cond29.preheader.i227
+
+for.cond29.preheader.i227:                        ; preds = %for.end
+  br label %for.body32.i328
+
+for.body32.i328:                                  ; preds = %for.body32.i328, %for.cond29.preheader.i227
+  %w.0524.i329 = phi i32 [ %sub.i381, %for.body32.i328 ], [ 0, %for.cond29.preheader.i227 ]
+  %sub.i381 = or i32 %w.0524.i329, 0
+  %cmp30.i384 = icmp sgt i32 %w.0524.i329, 0
+  br label %for.body32.i328
+}
+
+declare <8 x i32> @llvm.x86.avx2.maskload.d.256(ptr, <8 x i32>)
+
+
+; The test is used to check MOV64rm instruction with relocation and ADD64rr_ND
+; instruction are not folded to ADD64rm_ND with relocation. The later will emit
+; APX relocation which is not recognized by the builtin linker on released OS.
+
+; CHECK-LABEL: test_no_mem_fold
+; NOAPXREL: movq (%rip), %rbx
+; NOAPXREL-NEXT: R_X86_64_REX_GOTPCRELX gvar3-0x4
+; NOAPXREL-NOT: R_X86_64_CODE_4_GOTPCRELX
+
+@gvar3 = external global [40000 x i8]
+
+define void @test_no_mem_fold(i32 %fetch.1644, i32 %sub.1142, i32 %mul.455, ptr %dval1, ptr %j1, ptr %j2, <4 x i1> %0, i1 %condloop.41.not, i32 %fetch.1646, i32 %fetch.1647, i32 %sub.1108, i64 %int_sext16, i64 %sub.1114, i1 %condloop.45.not.not, <4 x i1> %1) {
+alloca_28:
+  br label %ifmerge.52
+
+do.body903:                                       ; preds = %ifmerge.2
+  %mul.453 = mul i32 %sub.1108, %fetch.1647
+  %sub.1144.neg = or i32 %mul.455, %fetch.1646
+  %mul.454.neg = mul i32 %sub.1144.neg, %fetch.1644
+  %sub.1147 = sub i32 0, %sub.1142
+  %int_sext36 = sext i32 %mul.453 to i64
+  %int_sext38 = sext i32 %mul.454.neg to i64
+  %add.974 = or i64 %int_sext36, %int_sext38
+  %div.98 = sdiv i64 %add.974, %int_sext16
+  br label %do.body907
+
+do.body907:                                       ; preds = %do.body907, %do.body903
+  %do.count41.0 = phi i64 [ %sub.1173, %do.body907 ], [ %div.98, %do.body903 ]
+  %gvar3.load = load double, ptr @gvar3, align 8
+  store double %gvar3.load, ptr null, align 8
+  call void (...) null(ptr null, ptr null, ptr null, ptr null, ptr %dval1, ptr null, ptr %j1, ptr %j2, ptr null, ptr null, ptr null, ptr null, ptr null, i64 0)
+  store i32 %sub.1147, ptr null, align 4
+  %sub.1173 = or i64 %do.count41.0, 1
+  %rel.314 = icmp sgt i64 %do.count41.0, 0
+  br label %do.body907
+
+ifmerge.52:                                       ; preds = %ifmerge.2, %alloca_28
+  %i1.i64.012 = phi i64 [ 0, %alloca_28 ], [ %sub.1114, %ifmerge.2 ]
+  %2 = getelementptr double, ptr @gvar3, i64 %i1.i64.012
+  br label %loop.45
+
+loop.45:                                          ; preds = %loop.45, %ifmerge.52
+  %3 = getelementptr double, ptr %2, <4 x i64> zeroinitializer
+  %4 = call <4 x double> @llvm.masked.gather.v4f64.v4p0(<4 x ptr> %3, i32 0, <4 x i1> %0, <4 x double> zeroinitializer)
+  call void @llvm.masked.scatter.v4f64.v4p0(<4 x double> %4, <4 x ptr> zeroinitializer, i32 0, <4 x i1> %0)
+  br i1 %condloop.45.not.not, label %loop.45, label %ifmerge.2
+
+ifmerge.2:                                        ; preds = %loop.45
+  br i1 %condloop.41.not, label %do.body903, label %ifmerge.52
+}
+
+declare <4 x double> @llvm.masked.gather.v4f64.v4p0(<4 x ptr>, i32 immarg, <4 x i1>, <4 x double>)
+declare void @llvm.masked.scatter.v4f64.v4p0(<4 x double>, <4 x ptr>, i32 immarg, <4 x i1>)
diff --git a/llvm/test/CodeGen/X86/apx/reloc.mir b/llvm/test/CodeGen/X86/apx/reloc.mir
index 9009f5b1a669c..877549b4322d1 100644
--- a/llvm/test/CodeGen/X86/apx/reloc.mir
+++ b/llvm/test/CodeGen/X86/apx/reloc.mir
@@ -57,7 +57,12 @@
     ret i32 undef
   }
 
-  define i32 @add64rm_nd() {
+  define i32 @add64rm_nd_gotpcrel() {
+  entry:
+    ret i32 undef
+  }
+
+  define i32 @add64rm_nd_gottpoff() {
   entry:
     ret i32 undef
   }
@@ -253,7 +258,28 @@ body: |
 # NOAPXREL: %1:gr64_norex2 = XOR64rm %0, $rip, 1, $noreg, target-flags(x86-gottpoff) @i, $noreg, implicit-def $eflags :: (load (s64))
 ...
 ---
-name: add64rm_nd
+name: add64rm_nd_gotpcrel
+alignment: 16
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: gr64 }
+  - { id: 1, class: gr64 }
+  - { id: 2, class: gr32 }
+body: |
+  bb.0.entry:
+    %0:gr64 = MOV64rm $rip, 1, $noreg, @x, $noreg :: (load (s64))
+    %1:gr64 = ADD64rm_ND %0, $rip, 1, $noreg, target-flags(x86-gotpcrel) @i, $noreg, implicit-def dead $eflags :: (load (s64) from got)
+    %2:gr32 = MOV32rm killed %1, 1, $noreg, 0, $fs :: (load (s32))
+    $eax = COPY %2
+    RET 0, $eax
+
+# CHECK: name: add64rm_nd_gotpcrel
+# APXREL: %1:gr64 = ADD64rm_ND %0, $rip, 1, $noreg, target-flags(x86-gotpcrel) @i, $noreg, implicit-def dead $eflags :: (load (s64) from got)
+# NOAPXREL: %3:gr64_norex2 = COPY %0
+# NOAPXREL: %1:gr64_norex2 = ADD64rm %3, $rip, 1, $noreg, target-flags(x86-gotpcrel) @i, $noreg, implicit-def dead $eflags
+...
+---
+name: add64rm_nd_gottpoff
 alignment: 16
 tracksRegLiveness: true
 registers:

phoebewang · 2025-05-11T09:50:47Z

llvm/test/CodeGen/X86/apx/reloc-regclass.ll

+; instruction are not folded to ADD64rm_ND with relocation. The later will emit
+; APX relocation which is not recognized by the builtin linker on released OS.
+
+; CHECK-LABEL: test_mem_fold


No APXREL check?

With APX relocation enabled, there was no CODE_4_GOTPCRELX emitted in this function, so I didn't add APXREL check here.

phoebewang · 2025-05-11T10:03:34Z

It looks to me like both 2) and 4) are caused due to we only changed the register class in MI but not in its use. Can we simply iterate all their uses and change to the same register class?

fzou1 · 2025-05-11T13:37:08Z

It looks to me like both 2) and 4) are caused due to we only changed the register class in MI but not in its use. Can we simply iterate all their uses and change to the same register class?

Issue 2) is not same issue as 4), and it's not related to the uses. The register class in operand 0 of instruction with relocation is updated to gr64_norex2_nosp but it's inflated by RA to GR64_NOSP.
%52:gr64_norex2_nosp = MOV64rm $rip, 1, $noreg, target-flags(x86-gotpcrel) @gvar, $noreg :: (load (s64) from got) ... Inflated %52 to GR64_NOSP ...

But I agreed it's better to change norex2 register class in all the uses. Will update.

fzou1 · 2025-05-11T14:53:39Z

It looks to me like both 2) and 4) are caused due to we only changed the register class in MI but not in its use. Can we simply iterate all their uses and change to the same register class?

Issue 2) is not same issue as 4), and it's not related to the uses. The register class in operand 0 of instruction with relocation is updated to gr64_norex2_nosp but it's inflated by RA to GR64_NOSP. %52:gr64_norex2_nosp = MOV64rm $rip, 1, $noreg, target-flags(x86-gotpcrel) @gvar, $noreg :: (load (s64) from got) ... Inflated %52 to GR64_NOSP ...

But I agreed it's better to change norex2 register class in all the uses. Will update.

Updated.

phoebewang · 2025-05-12T00:47:25Z

llvm/lib/Target/X86/X86SuppressAPXForReloc.cpp

+    const unsigned UseOpNum = 0;
+    if (Use.getOperand(UseOpNum).isReg())


Why operand 0 instead of the operand uses the same register?

The operand which uses the same register is automatically updated when it's set to non-rex2 register class. The IR after Suppress APX for relocation pass:

bb.0.alloca_15: %29:gr64_norex2 = MOV64rm $rip, 1, $noreg, target-flags(x86-gotpcrel) @gvar, $noreg :: (load (s64) from got) … bb.1.loop.253: … %6:gr64 = PHI %29:gr64_norex2, %bb.0, %9:gr64, %bb.3 …

But the register class of operand 0 (%6) in PHI instruction using %29 is gr64. It may replace the operand 0 in MOV64rm instruction with gotpcrel relocation. See the output in the passes:

After Eliminate PHI nodes for register allocation

bb.0.alloca_15: %29:gr64_norex2 = MOV64rm $rip, 1, $noreg, target-flags(x86-gotpcrel) @gvar, $noreg :: (load (s64) from got) %42:gr64 = COPY %29:gr64_norex2 bb.1.loop.253: %6:gr64 = COPY killed %42:gr64

During Machine Copy Propagation Pass

MCP: Replacing $r15 with $r16 in renamable $r15 = MOV64rm $rip, 1, $noreg, target-flags(x86-gotpcrel) @gvar, $noreg :: (load (s64) from got) from renamable $r16 = COPY killed renamable $r15 MCP: After replacement: renamable $r16 = MOV64rm $rip, 1, $noreg, target-flags(x86-gotpcrel) @gvar, $noreg :: (load (s64) from got)

After Machine Copy Propagation Pass

bb.0.alloca_15: renamable $r16 = MOV64rm $rip, 1, $noreg, target-flags(x86-gotpcrel) @gvar, $noreg :: (load (s64) from got)

So the operand 0 of PHI instruction should be updated to non-rex2 register class, to avoid APX relocation types emitted.

It looks like a general problem to me. We have some instructions not promoted to APX encoding, e.g., BSR/BSF. They should have the same problem. @KanRobert is it a design defect or something wrong in some where?

IIRC, for the existing optimizations and cg, the common register class will be used, aka. gr64 + gr64_norex2 -> gr64_norex2. If we find this problem here, probably it's brought by the APX suppressing patches.

Thanks @fzou1 and @KanRobert , then I think it's better to limit to PHI instrcution only for now.

fzou1 · 2025-05-12T01:44:09Z

No idea why it's failed as PCH size changed. I guess it's not related to my PR

[buildkite/github-pull-requests](https://buildkite.com/llvm-project/github-pull-requests/builds/177991) — Build #177991 failed (48 minutes, 56 seconds)
[buildkite/github-pull-requests/linux-linux-x64](https://buildkite.com/llvm-project/github-pull-requests/builds/177991#0196bfd5-0289-4483-9498-b8b8495046fe)
buildkite/github-pull-requests/linux-linux-x64 — Failed (exit status 1)

The error message in the log

[8841/12374] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/check-acc-structure.cpp.o�[K�_bk;t=1746951318765 
�_bk;t=1746951318765 �[31mFAILED: �[0mtools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/check-acc-structure.cpp.o 
�_bk;t=1746951318765 CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /usr/bin/clang++ -DFLANG_INCLUDE_TESTS=1 -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/build/tools/flang/lib/Semantics -I/var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/flang/lib/Semantics -I/var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/flang/include -I/var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/build/tools/flang/include -I/var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/build/include -I/var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/llvm/include -isystem /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/flang/../mlir/include -isystem /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/build/tools/mlir/include -isystem /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/build/tools/clang/include -isystem /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/llvm/../clang/include -gmlt -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wno-deprecated-copy -Wno-string-conversion -Wno-ctad-maybe-unsupported -Wno-unused-command-line-argument -Wstring-conversion           -Wcovered-switch-default -Wno-nested-anon-types -Xclang -fno-pch-timestamp -O3 -DNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++17 -Winvalid-pch -Xclang -include-pch -Xclang /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/build/tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/cmake_pch.hxx.pch -Xclang -include -Xclang /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/build/tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/cmake_pch.hxx -MD -MT tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/check-acc-structure.cpp.o -MF tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/check-acc-structure.cpp.o.d -o tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/check-acc-structure.cpp.o -c /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/flang/lib/Semantics/check-acc-structure.cpp
�_bk;t=1746951318765 �[0;1;31mfatal error: �[0m�[1mfile '/var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/llvm/include/llvm/ADT/bit.h' has been modified since the precompiled header '/var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/build/tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/cmake_pch.hxx.pch' was built: size changed (was 12409, now 12384)�[0m
�_bk;t=1746951318765 �[0;1;30mnote: �[0mplease rebuild precompiled header '/var/lib/buildkite-agent/builds/linux-56-59b8f5d88-r5vt5-1/llvm-project/github-pull-requests/build/tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/cmake_pch.hxx.pch'�[0m
�_bk;t=1746951318765 1 error generated.

phoebewang

LGTM except for one suggestion.

KanRobert · 2025-05-12T13:05:48Z

I think we will remove all the code about APX suppressing in llvm trunk at some point in the future, right? This increases considerable complexity. If so, can we mention previous PRs in the description for each PR so that we can find them quickly.

fzou1 · 2025-05-12T13:24:11Z

I think we will remove all the code about APX suppressing in llvm trunk at some point in the future, right? This increases considerable complexity. If so, can we mention previous PRs in the description for each PR so that we can find them quickly.

Yes. Good suggestion. Thank you. I'll mention PRs in late PR if there is. BTW, all of the code are managed by X86EnableAPXForRelocation option. That's another way to find all related code.

llvmbot added the backend:X86 label May 9, 2025

fzou1 requested review from phoebewang, KanRobert and FreddyLeaf May 9, 2025 16:24

fzou1 added 4 commits May 10, 2025 23:11

Add fix for another issue

a078f6e

Refine the code and update test

b22127b

Add comments

fe18e76

Rename test

682707a

phoebewang reviewed May 11, 2025

View reviewed changes

Suppress EGPR in all of register uses.

b735238

phoebewang reviewed May 12, 2025

View reviewed changes

phoebewang approved these changes May 12, 2025

View reviewed changes

Limit updating reg uses for PHI instruction and update tests.

088a135

fzou1 merged commit 80547cd into llvm:main May 12, 2025
9 of 11 checks passed

fzou1 deleted the apx_reloc_issues branch May 12, 2025 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[X86][APX] Fix issues of suppressing APX for relocation #139285

[X86][APX] Fix issues of suppressing APX for relocation #139285

fzou1 commented May 9, 2025 •

edited

Loading

llvmbot commented May 9, 2025

phoebewang May 11, 2025

fzou1 May 11, 2025 •

edited

Loading

phoebewang commented May 11, 2025

fzou1 commented May 11, 2025 •

edited

Loading

fzou1 commented May 11, 2025

phoebewang May 12, 2025

fzou1 May 12, 2025 •

edited

Loading

phoebewang May 12, 2025

KanRobert May 12, 2025

phoebewang May 12, 2025

fzou1 commented May 12, 2025

phoebewang left a comment

KanRobert commented May 12, 2025

fzou1 commented May 12, 2025

		const unsigned UseOpNum = 0;
		if (Use.getOperand(UseOpNum).isReg())

[X86][APX] Fix issues of suppressing APX for relocation #139285

[X86][APX] Fix issues of suppressing APX for relocation #139285

Conversation

fzou1 commented May 9, 2025 • edited Loading

llvmbot commented May 9, 2025

phoebewang May 11, 2025

Choose a reason for hiding this comment

fzou1 May 11, 2025 • edited Loading

Choose a reason for hiding this comment

phoebewang commented May 11, 2025

fzou1 commented May 11, 2025 • edited Loading

fzou1 commented May 11, 2025

phoebewang May 12, 2025

Choose a reason for hiding this comment

fzou1 May 12, 2025 • edited Loading

Choose a reason for hiding this comment

phoebewang May 12, 2025

Choose a reason for hiding this comment

KanRobert May 12, 2025

Choose a reason for hiding this comment

phoebewang May 12, 2025

Choose a reason for hiding this comment

fzou1 commented May 12, 2025

phoebewang left a comment

Choose a reason for hiding this comment

KanRobert commented May 12, 2025

fzou1 commented May 12, 2025

fzou1 commented May 9, 2025 •

edited

Loading

fzou1 May 11, 2025 •

edited

Loading

fzou1 commented May 11, 2025 •

edited

Loading

fzou1 May 12, 2025 •

edited

Loading