Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[BOLT][AArch64] Patch functions targeted by optional relocs #138750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

maksfb
Copy link
Contributor

@maksfb maksfb commented May 6, 2025

On AArch64, we create optional/weak relocations that may not be processed due to the relocated value overflow. When the overflow happens, we used to enforce patching for all functions in the binary via --force-patch option. This PR relaxes the requirement, and enforces patching only for functions that are target of optional relocations. Moreover, if the compact code model is used, the relocation overflow is guaranteed not to happen and the patching will be skipped.

On AArch64, we create optional/weak relocations that may not be
processed due to the relocated value overflow. When the overflow
happens, we used to enforce patching for all functions in the binary via
--force-patch option. This PR relaxes the requirement, and enforces
patching only for functions that are target of optional relocations.
Moreover, if the compact code model is used, the relocation overflow is
guaranteed not to happen and the patching will be skipped.
@llvmbot
Copy link
Member

llvmbot commented May 6, 2025

@llvm/pr-subscribers-bolt

Author: Maksim Panchenko (maksfb)

Changes

On AArch64, we create optional/weak relocations that may not be processed due to the relocated value overflow. When the overflow happens, we used to enforce patching for all functions in the binary via --force-patch option. This PR relaxes the requirement, and enforces patching only for functions that are target of optional relocations. Moreover, if the compact code model is used, the relocation overflow is guaranteed not to happen and the patching will be skipped.


Full diff: https://github.com/llvm/llvm-project/pull/138750.diff

9 Files Affected:

  • (modified) bolt/include/bolt/Core/BinaryFunction.h (+16)
  • (modified) bolt/include/bolt/Utils/CommandLineOpts.h (+1)
  • (modified) bolt/lib/Core/BinaryFunction.cpp (+19-2)
  • (modified) bolt/lib/Core/BinarySection.cpp (-10)
  • (modified) bolt/lib/Passes/LongJmp.cpp (+1-5)
  • (modified) bolt/lib/Passes/PatchEntries.cpp (+3-2)
  • (modified) bolt/lib/Utils/CommandLineOpts.cpp (+5)
  • (modified) bolt/test/AArch64/lite-mode.s (+29-8)
  • (modified) bolt/unittests/Core/BinaryContext.cpp (-30)
diff --git a/bolt/include/bolt/Core/BinaryFunction.h b/bolt/include/bolt/Core/BinaryFunction.h
index a52998564ee1b..7b3201594390f 100644
--- a/bolt/include/bolt/Core/BinaryFunction.h
+++ b/bolt/include/bolt/Core/BinaryFunction.h
@@ -360,6 +360,11 @@ class BinaryFunction {
   /// True if the function is used for patching code at a fixed address.
   bool IsPatch{false};
 
+  /// True if the original entry point of the function may get called, but the
+  /// original body cannot be executed and needs to be patched with code that
+  /// redirects execution to the new function body.
+  bool NeedsPatch{false};
+
   /// True if the function should not have an associated symbol table entry.
   bool IsAnonymous{false};
 
@@ -1372,9 +1377,17 @@ class BinaryFunction {
   /// Return true if this function is used for patching existing code.
   bool isPatch() const { return IsPatch; }
 
+  /// Return true if the function requires a patch.
+  bool needsPatch() const { return NeedsPatch; }
+
   /// Return true if the function should not have associated symbol table entry.
   bool isAnonymous() const { return IsAnonymous; }
 
+  /// Return true if we can allow the execution of the original body of the
+  /// function and its rewritten copy. This means, e.g., that metadata
+  /// associated with the function can be duplicated/cloned.
+  bool canClone() const;
+
   /// If this function was folded, return the function it was folded into.
   BinaryFunction *getFoldedIntoFunction() const { return FoldedIntoFunction; }
 
@@ -1757,6 +1770,9 @@ class BinaryFunction {
     IsPatch = V;
   }
 
+  /// Mark the function for patching.
+  void setNeedsPatch(bool V) { NeedsPatch = V; }
+
   /// Indicate if the function should have a name in the symbol table.
   void setAnonymous(bool V) {
     assert(isInjected() && "Only injected functions could be anonymous");
diff --git a/bolt/include/bolt/Utils/CommandLineOpts.h b/bolt/include/bolt/Utils/CommandLineOpts.h
index 3de945f6a1507..fbb2614ba85f3 100644
--- a/bolt/include/bolt/Utils/CommandLineOpts.h
+++ b/bolt/include/bolt/Utils/CommandLineOpts.h
@@ -34,6 +34,7 @@ extern llvm::cl::opt<unsigned> AlignText;
 extern llvm::cl::opt<unsigned> AlignFunctions;
 extern llvm::cl::opt<bool> AggregateOnly;
 extern llvm::cl::opt<unsigned> BucketsPerLine;
+extern llvm::cl::opt<bool> CompactCodeModel;
 extern llvm::cl::opt<bool> DiffOnly;
 extern llvm::cl::opt<bool> EnableBAT;
 extern llvm::cl::opt<bool> EqualizeBBCounts;
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index 9773e21aa7522..fa88b27ff292e 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -1797,8 +1797,6 @@ bool BinaryFunction::scanExternalRefs() {
     // Create relocation for every fixup.
     for (const MCFixup &Fixup : Fixups) {
       std::optional<Relocation> Rel = BC.MIB->createRelocation(Fixup, *BC.MAB);
-      // Can be skipped in case of overlow during relocation value encoding.
-      Rel->setOptional();
       if (!Rel) {
         Success = false;
         continue;
@@ -1814,6 +1812,17 @@ bool BinaryFunction::scanExternalRefs() {
         Success = false;
         continue;
       }
+
+      if (BC.isAArch64()) {
+        // Allow the relocation to be skipped in case of the overflow during the
+        // relocation value encoding.
+        Rel->setOptional();
+
+        if (!opts::CompactCodeModel)
+          if (BinaryFunction *TargetBF = BC.getFunctionForSymbol(Rel->Symbol))
+            TargetBF->setNeedsPatch(true);
+      }
+
       Rel->Offset += getAddress() - getOriginSection()->getAddress() + Offset;
       FunctionRelocations.push_back(*Rel);
     }
@@ -3744,6 +3753,14 @@ void BinaryFunction::postProcessBranches() {
   assert(validateCFG() && "invalid CFG");
 }
 
+bool BinaryFunction::canClone() const {
+  if (opts::Instrument)
+    return false;
+
+  // Check for the presence of metadata that cannot be duplicated.
+  return !hasEHRanges() && !hasSDTMarker() && !hasPseudoProbe() && !hasORC();
+}
+
 MCSymbol *BinaryFunction::addEntryPointAtOffset(uint64_t Offset) {
   assert(Offset && "cannot add primary entry point");
   assert(CurrentState == State::Empty || CurrentState == State::Disassembled);
diff --git a/bolt/lib/Core/BinarySection.cpp b/bolt/lib/Core/BinarySection.cpp
index e5def7547a187..6f07017c26060 100644
--- a/bolt/lib/Core/BinarySection.cpp
+++ b/bolt/lib/Core/BinarySection.cpp
@@ -186,16 +186,6 @@ void BinarySection::flushPendingRelocations(raw_pwrite_stream &OS,
         !Relocation::canEncodeValue(Reloc.Type, Value,
                                     SectionAddress + Reloc.Offset)) {
 
-      // A successful run of 'scanExternalRefs' means that all pending
-      // relocations are flushed. Otherwise, PatchEntries should run.
-      if (!opts::ForcePatch) {
-        BC.errs()
-            << "BOLT-ERROR: cannot encode relocation for symbol "
-            << Reloc.Symbol->getName()
-            << " as it is out-of-range. To proceed must use -force-patch\n";
-        exit(1);
-      }
-
       ++SkippedPendingRelocations;
       continue;
     }
diff --git a/bolt/lib/Passes/LongJmp.cpp b/bolt/lib/Passes/LongJmp.cpp
index e6bd417705e6f..4dade161cc232 100644
--- a/bolt/lib/Passes/LongJmp.cpp
+++ b/bolt/lib/Passes/LongJmp.cpp
@@ -12,6 +12,7 @@
 
 #include "bolt/Passes/LongJmp.h"
 #include "bolt/Core/ParallelUtilities.h"
+#include "bolt/Utils/CommandLineOpts.h"
 #include "llvm/Support/MathExtras.h"
 
 #define DEBUG_TYPE "longjmp"
@@ -26,11 +27,6 @@ extern cl::opt<unsigned> AlignFunctions;
 extern cl::opt<bool> UseOldText;
 extern cl::opt<bool> HotFunctionsAtEnd;
 
-static cl::opt<bool>
-    CompactCodeModel("compact-code-model",
-                     cl::desc("generate code for binaries <128MB on AArch64"),
-                     cl::init(false), cl::cat(BoltCategory));
-
 static cl::opt<bool> GroupStubs("group-stubs",
                                 cl::desc("share stubs across functions"),
                                 cl::init(true), cl::cat(BoltOptCategory));
diff --git a/bolt/lib/Passes/PatchEntries.cpp b/bolt/lib/Passes/PatchEntries.cpp
index 4877e7dd8fdf3..4b6dd831a6946 100644
--- a/bolt/lib/Passes/PatchEntries.cpp
+++ b/bolt/lib/Passes/PatchEntries.cpp
@@ -39,7 +39,8 @@ Error PatchEntries::runOnFunctions(BinaryContext &BC) {
     bool NeedsPatching = llvm::any_of(
         llvm::make_second_range(BC.getBinaryFunctions()),
         [&](BinaryFunction &BF) {
-          return !BC.shouldEmit(BF) && !BF.hasExternalRefRelocations();
+          return (!BC.shouldEmit(BF) && !BF.hasExternalRefRelocations()) ||
+                 BF.needsPatch();
         });
 
     if (!NeedsPatching)
@@ -65,7 +66,7 @@ Error PatchEntries::runOnFunctions(BinaryContext &BC) {
       continue;
 
     // Check if we can skip patching the function.
-    if (!opts::ForcePatch && !Function.hasEHRanges() &&
+    if (!opts::ForcePatch && !Function.needsPatch() && Function.canClone() &&
         Function.getSize() < PatchThreshold)
       continue;
 
diff --git a/bolt/lib/Utils/CommandLineOpts.cpp b/bolt/lib/Utils/CommandLineOpts.cpp
index ad714371436e0..2d1d697919712 100644
--- a/bolt/lib/Utils/CommandLineOpts.cpp
+++ b/bolt/lib/Utils/CommandLineOpts.cpp
@@ -61,6 +61,11 @@ cl::opt<unsigned>
                    cl::desc("number of entries per line (default 256)"),
                    cl::init(256), cl::Optional, cl::cat(HeatmapCategory));
 
+cl::opt<bool>
+    CompactCodeModel("compact-code-model",
+                     cl::desc("generate code for binaries <128MB on AArch64"),
+                     cl::init(false), cl::cat(BoltCategory));
+
 cl::opt<bool>
 DiffOnly("diff-only",
   cl::desc("stop processing once we have enough to compare two binaries"),
diff --git a/bolt/test/AArch64/lite-mode.s b/bolt/test/AArch64/lite-mode.s
index a71edbe034669..7ff1b719ea5d9 100644
--- a/bolt/test/AArch64/lite-mode.s
+++ b/bolt/test/AArch64/lite-mode.s
@@ -10,13 +10,17 @@
 # RUN:   | FileCheck %s --check-prefix=CHECK-INPUT
 # RUN: llvm-objdump -d --disassemble-symbols=cold_function %t.bolt \
 # RUN:   | FileCheck %s
+# RUN: llvm-objdump -d --disassemble-symbols=_start.org.0 %t.bolt \
+# RUN:   | FileCheck %s --check-prefix=CHECK-PATCH
 
 
 ## Verify that the number of FDEs matches the number of functions in the output
 ## binary. There are three original functions and two optimized.
+## NOTE: at the moment we are emitting extra FDEs for patched functions, thus
+## there is one more FDE for _start.
 # RUN: llvm-readelf -u %t.bolt | grep -wc FDE \
 # RUN:   | FileCheck --check-prefix=CHECK-FDE %s
-# CHECK-FDE: 5
+# CHECK-FDE: 6
 
 ## In lite mode, optimized code will be separated from the original .text by
 ## over 128MB, making it impossible for call/bl instructions in cold functions
@@ -28,15 +32,21 @@
 _start:
 # FDATA: 0 [unknown] 0 1 _start 0 0 100
   .cfi_startproc
+
+## Check that the code at the orignal location is converted into a veneer/thunk.
+# CHECK-PATCH-LABEL: <_start.org.0>
+# CHECK-PATCH-NEXT: adrp x16
+# CHECK-PATCH-NEXT: add x16, x16,
+# CHECK-PATCH-NEXT: br x16
   cmp  x0, 1
   b.eq  .L0
   bl cold_function
 .L0:
   ret  x30
   .cfi_endproc
-.size _start, .-_start
+  .size _start, .-_start
 
-## Cold non-optimized function with a reference to a hot function (_start).
+## Cold non-optimized function with references to hot functions.
 # CHECK: Disassembly of section .bolt.org.text:
 # CHECK-LABEL: <cold_function>
   .globl cold_function
@@ -97,12 +107,24 @@ cold_function:
 # CHECK-NEXT:       nop
 # CHECK-NEXT:       ldr x5
 
+## Since _start is relocated further than 128MB from the call site, we check
+## that the call is converted into a call to its original version. That original
+## version should contain a veneer/thunk code that we check separately.
+  bl      _start
+# CHECK-INPUT-NEXT: bl {{.*}} <_start>
+# CHECK-NEXT:       bl {{.*}} <_start.org.0>
+
+## Same as above, but the instruction is a tail call.
+  b       _start
+# CHECK-INPUT-NEXT: b {{.*}} <_start>
+# CHECK-NEXT:       b {{.*}} <_start.org.0>
+
   .cfi_endproc
-.size cold_function, .-cold_function
+  .size cold_function, .-cold_function
 
-## Reserve 1MB of space to make functions that follow unreachable by ADRs in
+## Reserve 128MB of space to make functions that follow unreachable by ADRs in
 ## code that precedes this gap.
-.space 0x100000
+.space 0x8000000
 
   .globl far_func
   .type far_func, %function
@@ -111,5 +133,4 @@ far_func:
   .cfi_startproc
   ret  x30
   .cfi_endproc
-.size far_func, .-far_func
-
+  .size far_func, .-far_func
diff --git a/bolt/unittests/Core/BinaryContext.cpp b/bolt/unittests/Core/BinaryContext.cpp
index 377517adf03db..ba3e4ce099347 100644
--- a/bolt/unittests/Core/BinaryContext.cpp
+++ b/bolt/unittests/Core/BinaryContext.cpp
@@ -162,36 +162,6 @@ TEST_P(BinaryContextTester, FlushPendingRelocJUMP26) {
       << "Wrong forward branch value\n";
 }
 
-TEST_P(BinaryContextTester,
-       FlushOptionalOutOfRangePendingRelocCALL26_ForcePatchOff) {
-  if (GetParam() != Triple::aarch64)
-    GTEST_SKIP();
-
-  // Tests that flushPendingRelocations exits if any pending relocation is out
-  // of range and PatchEntries hasn't run. Pending relocations are added by
-  // scanExternalRefs, so this ensures that either all scanExternalRefs
-  // relocations were flushed or PatchEntries ran.
-
-  BinarySection &BS = BC->registerOrUpdateSection(
-      ".text", ELF::SHT_PROGBITS, ELF::SHF_EXECINSTR | ELF::SHF_ALLOC);
-  // Create symbol 'Func0x4'
-  MCSymbol *RelSymbol = BC->getOrCreateGlobalSymbol(4, "Func");
-  ASSERT_TRUE(RelSymbol);
-  Relocation Reloc{8, RelSymbol, ELF::R_AARCH64_CALL26, 0, 0};
-  Reloc.setOptional();
-  BS.addPendingRelocation(Reloc);
-
-  SmallVector<char> Vect;
-  raw_svector_ostream OS(Vect);
-
-  // Resolve relocation symbol to a high value so encoding will be out of range.
-  EXPECT_EXIT(BS.flushPendingRelocations(
-                  OS, [&](const MCSymbol *S) { return 0x800000F; }),
-              ::testing::ExitedWithCode(1),
-              "BOLT-ERROR: cannot encode relocation for symbol Func0x4 as it is"
-              " out-of-range. To proceed must use -force-patch");
-}
-
 TEST_P(BinaryContextTester,
        FlushOptionalOutOfRangePendingRelocCALL26_ForcePatchOn) {
   if (GetParam() != Triple::aarch64)

@maksfb
Copy link
Contributor Author

maksfb commented May 6, 2025

The next step will be to enable lite mode by default for AArch64.

Copy link
Member

@yota9 yota9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG thanks

@maksfb
Copy link
Contributor Author

maksfb commented May 6, 2025

I moved out canClone() into a separate PR #138771.

Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Maks,

Looks good, thanks! One thing to consider:

Given that --compact-code-model now avoids patching, it may be good to add a test for it.

As is, the test would fail as expected because its size requirement is not met.
However, with a decreased artificial gap in the hot section, we could verify that the cold_function (which remains in the original section) points directly to the hot section, without needing an indirection through the original _start (which should remain unpatched, ie not a thunk).

Comment on lines +36 to +38
## Check that the code at the orignal location is converted into a veneer/thunk.
# CHECK-PATCH-LABEL: <_start.org.0>
# CHECK-PATCH-NEXT: adrp x16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo

Could also show that the veneer is pointing back to the hot code.

Suggested change
## Check that the code at the orignal location is converted into a veneer/thunk.
# CHECK-PATCH-LABEL: <_start.org.0>
# CHECK-PATCH-NEXT: adrp x16
## Check that the code at the original location is converted into a veneer/thunk.
# CHECK-PATCH-LABEL: <_start.org.0>
# CHECK-PATCH-NEXT: adrp x16, {{.*}} <_start>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants