Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[AArch64] Refactor @plt, @gotpcrel, and @AUTH to use parseDataExpr #134202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Apr 3, 2025

Following PR #132569 (RISC-V), which added parseDataExpr for parsing
expressions in data directives (e.g., .word), this PR migrates AArch64
@plt, @gotpcrel, and @AUTH from the parsePrimaryExpr workaround
to parseDataExpr. The goal is to align with the GNU assembler model,
where relocation specifiers apply to the entire operand rather than
individual terms, reducing complexity-especially evident in @AUTH
parsing.

Note: AArch64 ELF lacks an official syntax for data directives
(#132570). A prefix notation might be a preferable future direction.
I recommend %specifier(expr).

AsmParser's @specifier parsing is suboptimal, necessitating lexer
workarounds. @ might appear multiple times in an operand.
We should not use @ beyond the existing AArch64 Mach-O instruction
operands.

In the test elf-reloc-ptrauth.s, many errors are now reported at parse
time.

Created using spr 1.3.5-bogner
@llvmbot llvmbot added backend:AArch64 mc Machine (object) code labels Apr 3, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 3, 2025

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-aarch64

Author: Fangrui Song (MaskRay)

Changes

Following PR #132569 (RISC-V), which added parseDataExpr for parsing
expressions in data directives (e.g., .word), this PR migrates AArch64
@<!-- -->plt, @<!-- -->gotpcrel, and @<!-- -->AUTH from the parsePrimaryExpr workaround
to parseDataExpr. The goal is to align with the GNU assembler model,
where relocation specifiers apply to the entire operand rather than
individual terms, reducing complexity-especially evident in @<!-- -->AUTH
parsing.

Note: AArch64 ELF lacks an official syntax for data directives
(#132570). A prefix notation might be a preferable future direction.
I recommend %specifier(expr).

AsmParser's @<!-- -->specifier parsing is suboptimal, necessitating lexer
workarounds. @ might appear multiple times in an operand.
We should not use @ beyond the existing AArch64 Mach-O instruction
operands.

In the test elf-reloc-ptrauth.s, many errors are now reported at parse
time.


Full diff: https://github.com/llvm/llvm-project/pull/134202.diff

7 Files Affected:

  • (modified) llvm/include/llvm/MC/MCParser/MCAsmParser.h (+3)
  • (modified) llvm/lib/MC/MCParser/AsmParser.cpp (+20-6)
  • (modified) llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp (+73-54)
  • (modified) llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCAsmInfo.cpp (+2)
  • (modified) llvm/test/MC/AArch64/data-directive-specifier.s (+13-4)
  • (modified) llvm/test/MC/AArch64/elf-reloc-ptrauth.s (+18-56)
  • (modified) llvm/test/MC/AArch64/label-arithmetic-diags-darwin.s (+9-9)
diff --git a/llvm/include/llvm/MC/MCParser/MCAsmParser.h b/llvm/include/llvm/MC/MCParser/MCAsmParser.h
index c65a38c944eea..bbe6d1f2a0082 100644
--- a/llvm/include/llvm/MC/MCParser/MCAsmParser.h
+++ b/llvm/include/llvm/MC/MCParser/MCAsmParser.h
@@ -333,6 +333,9 @@ class MCAsmParser {
 
   /// Parse a .gnu_attribute.
   bool parseGNUAttribute(SMLoc L, int64_t &Tag, int64_t &IntegerValue);
+
+  bool parseAtSpecifier(const MCExpr *&Res, SMLoc &EndLoc);
+  const MCExpr *applySpecifier(const MCExpr *E, uint32_t Variant);
 };
 
 /// Create an MCAsmParser instance for parsing assembly similar to gas syntax
diff --git a/llvm/lib/MC/MCParser/AsmParser.cpp b/llvm/lib/MC/MCParser/AsmParser.cpp
index 71f2bdbdf0b16..14faea7c48af3 100644
--- a/llvm/lib/MC/MCParser/AsmParser.cpp
+++ b/llvm/lib/MC/MCParser/AsmParser.cpp
@@ -670,8 +670,6 @@ class AsmParser : public MCAsmParser {
   bool parseEscapedString(std::string &Data) override;
   bool parseAngleBracketString(std::string &Data) override;
 
-  const MCExpr *applySpecifier(const MCExpr *E, uint32_t Variant);
-
   // Macro-like directives
   MCAsmMacro *parseMacroLikeBody(SMLoc DirectiveLoc);
   void instantiateMacroLikeBody(MCAsmMacro *M, SMLoc DirectiveLoc,
@@ -1194,7 +1192,7 @@ bool AsmParser::parsePrimaryExpr(const MCExpr *&Res, SMLoc &EndLoc,
 
           Split = std::make_pair(Identifier, VName);
         }
-      } else {
+      } else if (Lexer.getAllowAtInIdentifier()) {
         Split = Identifier.split('@');
       }
     } else if (MAI.useParensForSpecifier() &&
@@ -1342,7 +1340,7 @@ bool AsmParser::parseExpression(const MCExpr *&Res) {
   return parseExpression(Res, EndLoc);
 }
 
-const MCExpr *AsmParser::applySpecifier(const MCExpr *E, uint32_t Spec) {
+const MCExpr *MCAsmParser::applySpecifier(const MCExpr *E, uint32_t Spec) {
   // Ask the target implementation about this expression first.
   const MCExpr *NewE = getTargetParser().applySpecifier(E, Spec, Ctx);
   if (NewE)
@@ -1433,6 +1431,23 @@ static std::string angleBracketString(StringRef AltMacroStr) {
   return Res;
 }
 
+bool MCAsmParser::parseAtSpecifier(const MCExpr *&Res, SMLoc &EndLoc) {
+  if (parseOptionalToken(AsmToken::At)) {
+    if (getLexer().isNot(AsmToken::Identifier))
+      return TokError("expected specifier following '@'");
+
+    auto Spec = MAI.getSpecifierForName(getTok().getIdentifier());
+    if (!Spec)
+      return TokError("invalid specifier '@" + getTok().getIdentifier() + "'");
+
+    const MCExpr *ModifiedRes = applySpecifier(Res, *Spec);
+    if (ModifiedRes)
+      Res = ModifiedRes;
+    Lex();
+  }
+  return false;
+}
+
 /// Parse an expression and return it.
 ///
 ///  expr ::= expr &&,|| expr               -> lowest.
@@ -1453,8 +1468,7 @@ bool AsmParser::parseExpression(const MCExpr *&Res, SMLoc &EndLoc) {
   // As a special case, we support 'a op b @ modifier' by rewriting the
   // expression to include the modifier. This is inefficient, but in general we
   // expect users to use 'a@modifier op b'.
-  if (Ctx.getAsmInfo()->useAtForSpecifier() &&
-      parseOptionalToken(AsmToken::At)) {
+  if (Lexer.getAllowAtInIdentifier() && parseOptionalToken(AsmToken::At)) {
     if (Lexer.isNot(AsmToken::Identifier))
       return TokError("unexpected symbol modifier following '@'");
 
diff --git a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
index 28b4cbb5efed8..894be7565fabe 100644
--- a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+++ b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
@@ -25,6 +25,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/StringSwitch.h"
 #include "llvm/ADT/Twine.h"
+#include "llvm/MC/MCAsmInfo.h"
 #include "llvm/MC/MCContext.h"
 #include "llvm/MC/MCExpr.h"
 #include "llvm/MC/MCInst.h"
@@ -180,6 +181,7 @@ class AArch64AsmParser : public MCTargetAsmParser {
   bool showMatchError(SMLoc Loc, unsigned ErrCode, uint64_t ErrorInfo,
                       OperandVector &Operands);
 
+  bool parseDataExpr(const MCExpr *&Res) override;
   bool parseAuthExpr(const MCExpr *&Res, SMLoc &EndLoc);
 
   bool parseDirectiveArch(SMLoc L);
@@ -335,8 +337,6 @@ class AArch64AsmParser : public MCTargetAsmParser {
   unsigned validateTargetOperandClass(MCParsedAsmOperand &Op,
                                       unsigned Kind) override;
 
-  bool parsePrimaryExpr(const MCExpr *&Res, SMLoc &EndLoc) override;
-
   static bool classifySymbolRef(const MCExpr *Expr,
                                 AArch64MCExpr::Specifier &ELFSpec,
                                 MCSymbolRefExpr::VariantKind &DarwinRefKind,
@@ -4478,6 +4478,19 @@ bool AArch64AsmParser::parseSymbolicImmVal(const MCExpr *&ImmVal) {
   if (HasELFModifier)
     ImmVal = AArch64MCExpr::create(ImmVal, RefKind, getContext());
 
+  SMLoc EndLoc;
+  if (getContext().getAsmInfo()->hasSubsectionsViaSymbols()) {
+    if (getParser().parseAtSpecifier(ImmVal, EndLoc))
+      return true;
+    const MCExpr *Term;
+    if (parseOptionalToken(AsmToken::Plus)) {
+      if (getParser().parseExpression(Term, EndLoc))
+        return true;
+      ImmVal =
+          MCBinaryExpr::create(MCBinaryExpr::Add, ImmVal, Term, getContext());
+    }
+  }
+
   return false;
 }
 
@@ -5007,11 +5020,18 @@ bool AArch64AsmParser::parseOperand(OperandVector &Operands, bool isCondCode,
 
     // This was not a register so parse other operands that start with an
     // identifier (like labels) as expressions and create them as immediates.
-    const MCExpr *IdVal;
+    const MCExpr *IdVal, *Term;
     S = getLoc();
     if (getParser().parseExpression(IdVal))
       return true;
-    E = SMLoc::getFromPointer(getLoc().getPointer() - 1);
+    if (getParser().parseAtSpecifier(IdVal, E))
+      return true;
+    if (parseOptionalToken(AsmToken::Plus)) {
+      if (getParser().parseExpression(Term, E))
+        return true;
+      IdVal =
+          MCBinaryExpr::create(MCBinaryExpr::Add, IdVal, Term, getContext());
+    }
     Operands.push_back(AArch64Operand::CreateImm(IdVal, S, E, getContext()));
 
     // Parse an optional shift/extend modifier.
@@ -8086,11 +8106,56 @@ bool AArch64AsmParser::parseDirectiveAeabiAArch64Attr(SMLoc L) {
   return false;
 }
 
-bool AArch64AsmParser::parsePrimaryExpr(const MCExpr *&Res, SMLoc &EndLoc) {
-  // Try @AUTH expressions: they're more complex than the usual symbol variants.
-  if (!parseAuthExpr(Res, EndLoc))
+bool AArch64AsmParser::parseDataExpr(const MCExpr *&Res) {
+  SMLoc EndLoc;
+
+  if (getParser().parseExpression(Res))
+    return true;
+  MCAsmParser &Parser = getParser();
+  if (!parseOptionalToken(AsmToken::At))
     return false;
-  return getParser().parsePrimaryExpr(Res, EndLoc, nullptr);
+  if (getLexer().getKind() != AsmToken::Identifier)
+    return Error(getLoc(), "expected relocation specifier");
+
+  std::string Identifier = Parser.getTok().getIdentifier().lower();
+  SMLoc Loc = getLoc();
+  Lex();
+  if (Identifier == "auth")
+    return parseAuthExpr(Res, EndLoc);
+
+  auto Spec = MCSymbolRefExpr::VK_None;
+  if (STI->getTargetTriple().isOSBinFormatMachO()) {
+    if (Identifier == "got")
+      Spec = MCSymbolRefExpr::VK_GOT;
+  } else {
+    // Unofficial, experimental syntax that will be changed.
+    if (Identifier == "gotpcrel")
+      Spec = MCSymbolRefExpr::VK_GOTPCREL;
+    else if (Identifier == "plt")
+      Spec = MCSymbolRefExpr::VK_PLT;
+  }
+  if (Spec == MCSymbolRefExpr::VK_None)
+    return Error(Loc, "invalid relocation specifier");
+  if (auto *SRE = dyn_cast<MCSymbolRefExpr>(Res))
+    Res = MCSymbolRefExpr::create(&SRE->getSymbol(), Spec, getContext(),
+                                  SRE->getLoc());
+  else
+    return Error(Loc, "@ specifier only allowed after a symbol");
+
+  for (;;) {
+    std::optional<MCBinaryExpr::Opcode> Opcode;
+    if (parseOptionalToken(AsmToken::Plus))
+      Opcode = MCBinaryExpr::Add;
+    else if (parseOptionalToken(AsmToken::Minus))
+      Opcode = MCBinaryExpr::Sub;
+    else
+      break;
+    const MCExpr *Term;
+    if (getParser().parsePrimaryExpr(Term, EndLoc, nullptr))
+      return true;
+    Res = MCBinaryExpr::create(*Opcode, Res, Term, getContext());
+  }
+  return false;
 }
 
 ///  parseAuthExpr
@@ -8100,54 +8165,8 @@ bool AArch64AsmParser::parsePrimaryExpr(const MCExpr *&Res, SMLoc &EndLoc) {
 bool AArch64AsmParser::parseAuthExpr(const MCExpr *&Res, SMLoc &EndLoc) {
   MCAsmParser &Parser = getParser();
   MCContext &Ctx = getContext();
-
   AsmToken Tok = Parser.getTok();
 
-  // Look for '_sym@AUTH' ...
-  if (Tok.is(AsmToken::Identifier) && Tok.getIdentifier().ends_with("@AUTH")) {
-    StringRef SymName = Tok.getIdentifier().drop_back(strlen("@AUTH"));
-    if (SymName.contains('@'))
-      return TokError(
-          "combination of @AUTH with other modifiers not supported");
-    Res = MCSymbolRefExpr::create(Ctx.getOrCreateSymbol(SymName), Ctx);
-
-    Parser.Lex(); // Eat the identifier.
-  } else {
-    // ... or look for a more complex symbol reference, such as ...
-    SmallVector<AsmToken, 6> Tokens;
-
-    // ... '"_long sym"@AUTH' ...
-    if (Tok.is(AsmToken::String))
-      Tokens.resize(2);
-    // ... or '(_sym + 5)@AUTH'.
-    else if (Tok.is(AsmToken::LParen))
-      Tokens.resize(6);
-    else
-      return true;
-
-    if (Parser.getLexer().peekTokens(Tokens) != Tokens.size())
-      return true;
-
-    // In either case, the expression ends with '@' 'AUTH'.
-    if (Tokens[Tokens.size() - 2].isNot(AsmToken::At) ||
-        Tokens[Tokens.size() - 1].isNot(AsmToken::Identifier) ||
-        Tokens[Tokens.size() - 1].getIdentifier() != "AUTH")
-      return true;
-
-    if (Tok.is(AsmToken::String)) {
-      StringRef SymName;
-      if (Parser.parseIdentifier(SymName))
-        return true;
-      Res = MCSymbolRefExpr::create(Ctx.getOrCreateSymbol(SymName), Ctx);
-    } else {
-      if (Parser.parsePrimaryExpr(Res, EndLoc, nullptr))
-        return true;
-    }
-
-    Parser.Lex(); // '@'
-    Parser.Lex(); // 'AUTH'
-  }
-
   // At this point, we encountered "<id>@AUTH". There is no fallback anymore.
   if (parseToken(AsmToken::LParen, "expected '('"))
     return true;
diff --git a/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCAsmInfo.cpp b/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCAsmInfo.cpp
index 9ff53631a995e..4bc84ce9b8e80 100644
--- a/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCAsmInfo.cpp
+++ b/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCAsmInfo.cpp
@@ -61,6 +61,7 @@ AArch64MCAsmInfoDarwin::AArch64MCAsmInfoDarwin(bool IsILP32) {
   UsesELFSectionDirectiveForBSS = true;
   SupportsDebugInformation = true;
   UseDataRegionDirectives = true;
+  UseAtForSpecifier = false;
 
   ExceptionsType = ExceptionHandling::DwarfCFI;
 
@@ -105,6 +106,7 @@ AArch64MCAsmInfoELF::AArch64MCAsmInfoELF(const Triple &T) {
   Data64bitsDirective = "\t.xword\t";
 
   UseDataRegionDirectives = false;
+  UseAtForSpecifier = false;
 
   WeakRefDirective = "\t.weak\t";
 
diff --git a/llvm/test/MC/AArch64/data-directive-specifier.s b/llvm/test/MC/AArch64/data-directive-specifier.s
index 3a8665126097a..5410bcb4a4211 100644
--- a/llvm/test/MC/AArch64/data-directive-specifier.s
+++ b/llvm/test/MC/AArch64/data-directive-specifier.s
@@ -1,5 +1,6 @@
 # RUN: llvm-mc -triple=aarch64 -filetype=obj %s | llvm-readobj -r - | FileCheck %s
-# RUN: not llvm-mc -triple=aarch64 -filetype=obj %s --defsym ERR=1 -o /dev/null 2>&1 | FileCheck %s --check-prefix=ERR --implicit-check-not=error:
+# RUN: not llvm-mc -triple=aarch64 %s --defsym ERR=1 -o /dev/null 2>&1 | FileCheck %s --check-prefix=ERR --implicit-check-not=error:
+# RUN: not llvm-mc -triple=aarch64 -filetype=obj %s --defsym OBJERR=1 -o /dev/null 2>&1 | FileCheck %s --check-prefix=OBJERR --implicit-check-not=error:
 
 .globl g
 g:
@@ -32,13 +33,21 @@ data1:
 .word extern@GOTPCREL-5
 
 .ifdef ERR
-# ERR: [[#@LINE+1]]:7: error: symbol 'und' can not be undefined in a subtraction expression
-.word extern@plt - und
+# ERR: [[#@LINE+1]]:9: error: @ specifier only allowed after a symbol
+.quad 3@plt - .
+
+# ERR: [[#@LINE+1]]:9: error: expected ')'
+.quad (l@plt - .)
+.endif
 
+.ifdef OBJERR
 .quad g@plt - .
 
 .word extern@gotpcrel - .
 
-# ERR: [[#@LINE+1]]:7: error: symbol 'und' can not be undefined in a subtraction expression
+# OBJERR: [[#@LINE+1]]:7: error: symbol 'und' can not be undefined in a subtraction expression
+.word extern@plt - und
+
+# OBJERR: [[#@LINE+1]]:7: error: symbol 'und' can not be undefined in a subtraction expression
 .word extern@gotpcrel - und
 .endif
diff --git a/llvm/test/MC/AArch64/elf-reloc-ptrauth.s b/llvm/test/MC/AArch64/elf-reloc-ptrauth.s
index bed85bcc5798b..263ed91ec8e99 100644
--- a/llvm/test/MC/AArch64/elf-reloc-ptrauth.s
+++ b/llvm/test/MC/AArch64/elf-reloc-ptrauth.s
@@ -1,4 +1,4 @@
-// RUN: llvm-mc -triple=aarch64 %s --defsym=ASMONLY=1 | FileCheck %s --check-prefix=ASM
+// RUN: llvm-mc -triple=aarch64 %s | FileCheck %s --check-prefix=ASM
 
 // RUN: llvm-mc -triple=aarch64 -filetype=obj %s | \
 // RUN:   llvm-readelf -S -r -x .test - | FileCheck %s --check-prefix=RELOC
@@ -41,8 +41,6 @@
 // RELOC-NEXT: 70 00000000 10000000
 //                         ^^^^ discriminator
 //                               ^^ 0 no addr diversity 0 reserved 00 ia key 0000 reserved
-// RELOC-NEXT: 80 04000000 00000000
-// Folded to constant 4 bytes difference between _g9 and _g8
 
 .section    .helper
 .local "_g 6"
@@ -63,12 +61,12 @@ _g9:
 .quad _g0@AUTH(ia,42)
 .quad 0
 
-// ASM:          .xword _g1@AUTH(ib,0)
-.quad _g1@AUTH(ib,0)
+// ASM:          .xword (+_g1)@AUTH(ib,0)
+.quad +_g1@AUTH(ib,0)
 .quad 0
 
 // ASM:          .xword _g2@AUTH(da,5,addr)
-.quad _g2@AUTH(da,5,addr)
+.quad _g2 @ AUTH(da,5,addr)
 .quad 0
 
 // ASM:          .xword _g3@AUTH(db,65535,addr)
@@ -91,33 +89,20 @@ _g9:
 .quad ("_g 7" + 7)@AUTH(ia,16)
 .quad 0
 
-// ASM:          .xword _g9@AUTH(ia,42)-_g8@AUTH(ia,42)
-.quad _g9@AUTH(ia,42) - _g8@AUTH(ia,42)
-.quad 0
+// RUN: not llvm-mc -triple=aarch64 --defsym=ERR=1 %s 2>&1 | \
+// RUN:   FileCheck %s --check-prefix=ERR
 
-.ifdef ASMONLY
+.ifdef ERR
 
-// ASM:          .xword _g10@AUTH(ia,42)+1
 .quad _g10@AUTH(ia,42) + 1
 
-// ASM:          .xword 1+_g11@AUTH(ia,42)
 .quad 1 + _g11@AUTH(ia,42)
 
-// ASM:          .xword 1+_g12@AUTH(ia,42)+1
 .quad 1 + _g12@AUTH(ia,42) + 1
 
-// ASM:          .xword _g13@AUTH(ia,42)+_g14@AUTH(ia,42)
 .quad _g13@AUTH(ia,42) + _g14@AUTH(ia,42)
 
-// ASM:          .xword _g9@AUTH(ia,42)-_g8
 .quad _g9@AUTH(ia,42) - _g8
-.quad 0
-
-.endif // ASMONLY
-
-.ifdef ERR
-// RUN: not llvm-mc -triple=aarch64 --defsym=ERR=1 %s 2>&1 | \
-// RUN:   FileCheck %s --check-prefix=ERR
 
 // ERR: :[[#@LINE+1]]:15: error: expected '('
 .quad sym@AUTH)ia,42)
@@ -143,51 +128,28 @@ _g9:
 // ERR: :[[#@LINE+1]]:21: error: expected ')'
 .quad sym@AUTH(ia,42(
 
-// ERR: :[[#@LINE+1]]:7: error: combination of @AUTH with other modifiers not supported
+// ERR: :[[#@LINE+1]]:14: error: unexpected token
 .quad sym@PLT@AUTH(ia,42)
 
-// ERR: :[[#@LINE+1]]:11: error: invalid variant 'AUTH@GOT'
+// ERR: :[[#@LINE+1]]:15: error: expected '('
 .quad sym@AUTH@GOT(ia,42)
 
-// ERR: :[[#@LINE+1]]:18: error: invalid variant 'TLSDESC@AUTH'
-.quad "long sym"@TLSDESC@AUTH(ia,42)
-
-// ERR: :[[#@LINE+1]]:18: error: invalid variant 'AUTH@PLT'
+// ERR: :[[#@LINE+1]]:22: error: expected '('
 .quad "long sym"@AUTH@PLT(ia,42)
 
-// ERR: :[[#@LINE+1]]:17: error: invalid variant 'GOT@AUTH'
+// ERR: :[[#@LINE+1]]:17: error: invalid relocation specifier
 .quad (sym - 5)@GOT@AUTH(ia,42)
 
-// ERR: :[[#@LINE+1]]:17: error: invalid variant 'AUTH@TLSDESC'
-.quad (sym + 5)@AUTH@TLSDESC(ia,42)
-
-// ERR: :[[#@LINE+1]]:12: error: invalid variant 'AUTH'
-.quad +sym@AUTH(ia,42)
-
-.endif // ERR
-
-.ifdef ERROBJ
-// RUN: not llvm-mc -triple=aarch64 -filetype=obj --defsym=ERROBJ=1 %s -o /dev/null 2>&1 | \
-// RUN:   FileCheck %s --check-prefix=ERROBJ
-
-// ERROBJ: :[[#@LINE+1]]:7: error: expected relocatable expression
+// ERR: :[[#@LINE+1]]:23: error: unexpected token
 .quad sym@AUTH(ia,42) + 1
 
-// ERROBJ: :[[#@LINE+1]]:7: error: expected relocatable expression
-.quad 1 + sym@AUTH(ia,42)
-
-// ERROBJ: :[[#@LINE+1]]:7: error: expected relocatable expression
+// ERR: :[[#@LINE+1]]:27: error: unexpected token
 .quad 1 + sym@AUTH(ia,42) + 1
 
-// ERROBJ: :[[#@LINE+1]]:7: error: expected relocatable expression
-.quad sym@AUTH(ia,42) + sym@AUTH(ia,42)
-
-// TODO: do we really want to emit an error here? It might not be important
-// whether a symbol has an AUTH modifier or not since the compile-time computed
-// distance remains the same. Leave it in such state as for now since it
-// makes code simpler: subtraction of a non-AUTH symbol and of a constant
-// are handled identically.
-// ERROBJ: :[[#@LINE+1]]:7: error: expected relocatable expression
+/// @AUTH applies to the whole operand instead of an individual term.
+/// Trailing expression parts are not allowed even if the logical subtraction
+/// result might make sense.
+// ERR: :[[#@LINE+1]]:23: error: unexpected token
 .quad _g9@AUTH(ia,42) - _g8
 
-.endif // ERROBJ
+.endif // ERR
diff --git a/llvm/test/MC/AArch64/label-arithmetic-diags-darwin.s b/llvm/test/MC/AArch64/label-arithmetic-diags-darwin.s
index e32db7c125bb4..357e04a828f8e 100644
--- a/llvm/test/MC/AArch64/label-arithmetic-diags-darwin.s
+++ b/llvm/test/MC/AArch64/label-arithmetic-diags-darwin.s
@@ -1,9 +1,17 @@
+// RUN: not llvm-mc -triple aarch64-darwin -filetype=obj --defsym PARSE=1 %s -o /dev/null 2>&1 | FileCheck %s --check-prefix=ERR
 // RUN: not llvm-mc -triple aarch64-darwin -filetype=obj %s -o /dev/null 2>&1 | FileCheck %s
 // RUN: not llvm-mc -triple aarch64-ios -filetype=obj %s -o /dev/null 2>&1 | FileCheck %s
 
 Lstart:
   .space 8
 Lend:
+.ifdef PARSE
+  add w0, w1, #(Lend - var@TLVPPAGEOFF)
+  // ERR: [[#@LINE-1]]:27: error: expected ')'
+  cmp w0, #(Lend - var@TLVPPAGEOFF)
+  // ERR: [[#@LINE-1]]:23: error: expected ')'
+
+.else
   add w0, w1, #(Lend - external)
   cmp w0, #(Lend - external)
   // CHECK: error: unknown AArch64 fixup kind!
@@ -13,15 +21,6 @@ Lend:
   // CHECK-NEXT: cmp w0, #(Lend - external)
   // CHECK-NEXT: ^
 
-  add w0, w1, #(Lend - var@TLVPPAGEOFF)
-  cmp w0, #(Lend - var@TLVPPAGEOFF)
-  // CHECK: error: unsupported subtraction of qualified symbol
-  // CHECK-NEXT: add w0, w1, #(Lend - var@TLVPPAGEOFF)
-  // CHECK-NEXT: ^
-  // CHECK: error: unsupported subtraction of qualified symbol
-  // CHECK-NEXT: cmp w0, #(Lend - var@TLVPPAGEOFF)
-  // CHECK-NEXT: ^
-
   add w0, w1, #(Lstart - Lend)
   cmp w0, #(Lstart - Lend)
   // CHECK: error: fixup value out of range
@@ -66,3 +65,4 @@ Lend_across_sec:
   // CHECK: error: unknown AArch64 fixup kind!
   // CHECK-NEXT: cmp w0, #(Lend_across_sec - Lprivate2)
   // CHECK-NEXT: ^
+.endif

@MaskRay
Copy link
Member Author

MaskRay commented Apr 3, 2025

I submitted issue #132569 and created this PR to address the syntax issues in AArch64AsmParser and reduce its dependence on the parsing workarounds found in llvm/lib/MC/MCParser/AsmParser.cpp. Currently, in data directives, @specifier is only utilized by -fexperimental-relative-c++-abi-vtables and PAuth.

I propose introducing a breaking change to eliminate future dependencies on @specifier. RISC-V made the breaking change. GNU Assembler doesn't support @specifier in data directives for either AArch64 or RISC-V.

While I’d prefer for Mach-O to phase out @specifier in data directives, I recognize that it’s used in Mach-O’s instruction operands and may already be part of some shipped arm64e syntax. That said, I strongly believe ELF would benefit from completely removing @ to streamline its implementation.

https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers For new architectures, I'd suggest adopting %specifier(expr), and never use @specifier. The % symbol works seamlessly with data directives, and during operand parsing, the parser can simply peek at the first token to check for a relocation specifier.

I favor %specifier(expr) over %specifier expr because it provides clearer scoping, especially in data directives with multiple operands, such as .long %lo(a), %lo(b).

Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.

To the best that I can infer from reading the code this doesn't limit affect any data directive that is legal Today.

It is a bit of shame that we lose some of the context from the error messages, although I can't easily see how to put that back at the time the error is detected.

MaskRay added 3 commits April 5, 2025 21:24
Created using spr 1.3.5-bogner
Created using spr 1.3.5-bogner
Created using spr 1.3.5-bogner
Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for answering the questions. This looks good to me, and should only have an effect on incorrect inputs.

As with the other changes in this series, please leave a day or two for other reviewers to comment.

@kovdan01
Copy link
Contributor

kovdan01 commented Apr 7, 2025

LGTM as long as there are no other objections, thanks!

@MaskRay MaskRay merged commit 26475f5 into main Apr 8, 2025
11 checks passed
@MaskRay MaskRay deleted the users/MaskRay/spr/aarch64-refactor-plt-gotpcrel-and-auth-to-use-parsedataexpr branch April 8, 2025 16:09
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 8, 2025
…DataExpr

Following PR #132569 (RISC-V), which added `parseDataExpr` for parsing
expressions in data directives (e.g., `.word`), this PR migrates AArch64
`@plt`, `@gotpcrel`, and `@AUTH` from the `parsePrimaryExpr` workaround
to `parseDataExpr`. The goal is to align with the GNU assembler model,
where relocation specifiers apply to the entire operand rather than
individual terms, reducing complexity-especially evident in `@AUTH`
parsing.

Note: AArch64 ELF lacks an official syntax for data directives
(#132570). A prefix notation might be a preferable future direction.
I recommend `%specifier(expr)`.

AsmParser's `@specifier` parsing is suboptimal, necessitating lexer
workarounds. `@` might appear multiple times in an operand.
We should not use `@` beyond the existing AArch64 Mach-O instruction
operands.

In the test elf-reloc-ptrauth.s, many errors are now reported at parse
time.

Pull Request: llvm/llvm-project#134202
@DKLoehr
Copy link
Contributor

DKLoehr commented Apr 8, 2025

I'm not sure because I haven't been able to reproduce locally (cross-compiling), but I suspect this might be causing failures building chromium. Apologies if this is ends up being the wrong change. We get the following output (example failed build).:

../../third_party/boringssl/src/gen/bcm/p256-armv8-asm-apple.S:1160:25: error: unexpected token in argument list
 adrp x23,Lone_mont@PAGE-64
                        ^
../../third_party/boringssl/src/gen/bcm/p256-armv8-asm-apple.S:1161:31: error: unexpected token in argument list
 add x23,x23,Lone_mont@PAGEOFF-64
                              ^

The source file in question can be viewed here. I'm not familiar with this code, but it seems like it's complaining about the -. Is this supposed to be valid?

@davidben
Copy link
Contributor

davidben commented Apr 8, 2025

The relevant source file is actually synthesized from a perl script (don't ask; it's OpenSSL's fault) that transforms ELF-style :pg_hi21:foo into Apple-style foo@PAGE and whatnot. Looks like the regex did not capture the -64 and put the @PAGE suffix in the middle of the term. It is a little odd that it got emitted that way, but I guess it worked and no one noticed?

I take it Lone_mont-64@PAGE would be more correct? After someone confirms that is indeed the right syntax (been trying to find an official reference and not succeeding), I can fix BoringSSL to emit that.

That said, it looks like OpenSSL (which has a similar file) does the same thing, so I suspect LLVM will need to support Lone_mont@PAGE-64, as ridiculous as this looks. (You won't find it in OpenSSL's source tree because they run the Perl scripts as part of the build.)

@davidben
Copy link
Contributor

davidben commented Apr 9, 2025

I'm now not so sure this syntax is invalid. Here's what Xcode's clang outputs:

% clang --version
Apple clang version 17.0.0 (clang-1700.0.13.3)
Target: arm64-apple-darwin24.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
% cat foo.c
static int x[20];

int *get_ptr() {
    return x + 5;
}
% clang -S -O2 foo.c
% cat foo.s
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 15, 0	sdk_version 15, 4
	.globl	_get_ptr                        ; -- Begin function get_ptr
	.p2align	2
_get_ptr:                               ; @get_ptr
	.cfi_startproc
; %bb.0:
Lloh0:
	adrp	x0, _x@PAGE+20
Lloh1:
	add	x0, x0, _x@PAGEOFF+20
	ret
	.loh AdrpAdd	Lloh0, Lloh1
	.cfi_endproc
                                        ; -- End function
.zerofill __DATA,__bss,_x,80,2          ; @x
.subsections_via_symbols

That is definitely not the syntax I would have expected, given that @PAGE and @PAGEOFF are presumably intended to modify the whole argument, but it seems Clang also prefers to emit this funny interleaved version.

MaskRay added a commit that referenced this pull request Apr 10, 2025
#134202 removed support for
`sym@page-offset` in instruction operands. This change is generally
reasonable since subtracting an offset from a symbol typically doesn’t
make sense for Mach-O due to its .subsections_via_symbols mechanism, which treats
them as separate atoms.

However, BoringSSL relies on a temporary symbol with a negative offset,
which can be meaningful when the symbol and the referenced location are
within the same atom.
```
../../third_party/boringssl/src/gen/bcm/p256-armv8-asm-apple.S:1160:25: error: unexpected token in argument list
 adrp x23,Lone_mont@PAGE-64
```

It's worth noting that expressions involving @ can be complex and
brittle in MCParser, and much of the Mach-O @ offsets remains
under-tested.

* Allow default argument for parsePrimaryExpr. The argument, used by the niche llvm-ml,
  should not require other targets to adapt.
@MaskRay
Copy link
Member Author

MaskRay commented Apr 10, 2025

I'm now not so sure this syntax is invalid. Here's what Xcode's clang outputs:

% clang --version
Apple clang version 17.0.0 (clang-1700.0.13.3)
Target: arm64-apple-darwin24.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
% cat foo.c
static int x[20];

int *get_ptr() {
    return x + 5;
}
% clang -S -O2 foo.c
% cat foo.s
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 15, 0	sdk_version 15, 4
	.globl	_get_ptr                        ; -- Begin function get_ptr
	.p2align	2
_get_ptr:                               ; @get_ptr
	.cfi_startproc
; %bb.0:
Lloh0:
	adrp	x0, _x@PAGE+20
Lloh1:
	add	x0, x0, _x@PAGEOFF+20
	ret
	.loh AdrpAdd	Lloh0, Lloh1
	.cfi_endproc
                                        ; -- End function
.zerofill __DATA,__bss,_x,80,2          ; @x
.subsections_via_symbols

That is definitely not the syntax I would have expected, given that @PAGE and @PAGEOFF are presumably intended to modify the whole argument, but it seems Clang also prefers to emit this funny interleaved version.

While add x0, x0, _x@PAGEOFF+1 is valid in Mach-O, add x0, x0, _x@PAGEOFF-1 is not as the referenced location and _x belong to different atoms. The linker might reorder _x so that the referenced location and x is no longer apart by 1 byte.

However, add x23,x23,Lone_mont@PAGEOFF-64 looks valid. L... defines a temporary symbol, which does not create a new atom. Lone_mont@PAGEOFF-64 and Lone_mont@PAGEOFF could belong to the same subsection.

Restored the support in 3fd0d22. Sorry for the trouble!

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 10, 2025
llvm/llvm-project#134202 removed support for
`sym@page-offset` in instruction operands. This change is generally
reasonable since subtracting an offset from a symbol typically doesn’t
make sense for Mach-O due to its .subsections_via_symbols mechanism, which treats
them as separate atoms.

However, BoringSSL relies on a temporary symbol with a negative offset,
which can be meaningful when the symbol and the referenced location are
within the same atom.
```
../../third_party/boringssl/src/gen/bcm/p256-armv8-asm-apple.S:1160:25: error: unexpected token in argument list
 adrp x23,Lone_mont@PAGE-64
```

It's worth noting that expressions involving @ can be complex and
brittle in MCParser, and much of the Mach-O @ offsets remains
under-tested.

* Allow default argument for parsePrimaryExpr. The argument, used by the niche llvm-ml,
  should not require other targets to adapt.
@davidben
Copy link
Contributor

Ah yeah, our asm files also don't yet set .subsections_via_symbols (or emit -ffunction-sections-like code) because the asm is derived from OpenSSL, which likes to play tricks like this. We have a bug open to change this, but would require a bit of work to make sure it won't break: https://issues.chromium.org/issues/42290614

var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
llvm#134202 removed support for
`sym@page-offset` in instruction operands. This change is generally
reasonable since subtracting an offset from a symbol typically doesn’t
make sense for Mach-O due to its .subsections_via_symbols mechanism, which treats
them as separate atoms.

However, BoringSSL relies on a temporary symbol with a negative offset,
which can be meaningful when the symbol and the referenced location are
within the same atom.
```
../../third_party/boringssl/src/gen/bcm/p256-armv8-asm-apple.S:1160:25: error: unexpected token in argument list
 adrp x23,Lone_mont@PAGE-64
```

It's worth noting that expressions involving @ can be complex and
brittle in MCParser, and much of the Mach-O @ offsets remains
under-tested.

* Allow default argument for parsePrimaryExpr. The argument, used by the niche llvm-ml,
  should not require other targets to adapt.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants