-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[AArch64] Initial compiler support for SVE unwind on Windows. #138609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Most bits of this are straightforward: when we emit SVE instructions in the prologue/epilogue, emit corresponding opcodes. The unfortunately nasty bit is the handling of the frame pointer in functions that use the SVE calling convention. If we have SVE callee saves, and need to restore the stack pointer from the frame pointer, it's impossible to encode callee saves that happen after the frame pointer. So this patch rearranges the stack to put SVE callee saves first. This isn't really that complicated on its own, but it leads to a lot of tricky conditionals (see FPAfterSVECalleeSaves).
@llvm/pr-subscribers-backend-aarch64 Author: Eli Friedman (efriedma-quic) ChangesMost bits of this are straightforward: when we emit SVE instructions in the prologue/epilogue, emit corresponding opcodes. The unfortunately nasty bit is the handling of the frame pointer in functions that use the SVE calling convention. If we have SVE callee saves, and need to restore the stack pointer from the frame pointer, it's impossible to encode callee saves that happen after the frame pointer. So this patch rearranges the stack to put SVE callee saves first. This isn't really that complicated on its own, but it leads to a lot of tricky conditionals (see FPAfterSVECalleeSaves). Patch is 68.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138609.diff 7 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
index 870df4c387ca4..47bc44dab9d90 100644
--- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
@@ -3294,6 +3294,32 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
-MI->getOperand(2).getImm());
return;
+ case AArch64::SEH_AllocZ:
+ assert(MI->getOperand(0).getImm() >= 0 &&
+ "AllocZ SEH opcode offset must be non-negative");
+ assert(MI->getOperand(0).getImm() <= 255 &&
+ "AllocZ SEH opcode offset must fit into 8 bits");
+ TS->emitARM64WinCFIAllocZ(MI->getOperand(0).getImm());
+ return;
+
+ case AArch64::SEH_SaveZReg:
+ assert(MI->getOperand(1).getImm() >= 0 &&
+ "SaveZReg SEH opcode offset must be non-negative");
+ assert(MI->getOperand(1).getImm() <= 255 &&
+ "SaveZReg SEH opcode offset must fit into 8 bits");
+ TS->emitARM64WinCFISaveZReg(MI->getOperand(0).getImm(),
+ MI->getOperand(1).getImm());
+ return;
+
+ case AArch64::SEH_SavePReg:
+ assert(MI->getOperand(1).getImm() >= 0 &&
+ "SavePReg SEH opcode offset must be non-negative");
+ assert(MI->getOperand(1).getImm() <= 255 &&
+ "SavePReg SEH opcode offset must fit into 8 bits");
+ TS->emitARM64WinCFISavePReg(MI->getOperand(0).getImm(),
+ MI->getOperand(1).getImm());
+ return;
+
case AArch64::BLR:
case AArch64::BR: {
recordIfImportCall(MI);
diff --git a/llvm/lib/Target/AArch64/AArch64CallingConvention.td b/llvm/lib/Target/AArch64/AArch64CallingConvention.td
index 7cca6d9bc6b9c..287bbbce95bd9 100644
--- a/llvm/lib/Target/AArch64/AArch64CallingConvention.td
+++ b/llvm/lib/Target/AArch64/AArch64CallingConvention.td
@@ -606,6 +606,9 @@ def CSR_Win_AArch64_Arm64EC_Thunk : CalleeSavedRegs<(add (sequence "Q%u", 6, 15)
def CSR_AArch64_AAVPCS : CalleeSavedRegs<(add X19, X20, X21, X22, X23, X24,
X25, X26, X27, X28, LR, FP,
(sequence "Q%u", 8, 23))>;
+def CSR_Win_AArch64_AAVPCS : CalleeSavedRegs<(add X19, X20, X21, X22, X23, X24,
+ X25, X26, X27, X28, FP, LR,
+ (sequence "Q%u", 8, 23))>;
// Functions taking SVE arguments or returning an SVE type
// must (additionally) preserve full Z8-Z23 and predicate registers P4-P15
@@ -619,6 +622,11 @@ def CSR_Darwin_AArch64_SVE_AAPCS : CalleeSavedRegs<(add (sequence "Z%u", 8, 23),
LR, FP, X19, X20, X21, X22,
X23, X24, X25, X26, X27, X28)>;
+def CSR_Win_AArch64_SVE_AAPCS : CalleeSavedRegs<(add (sequence "P%u", 4, 11),
+ (sequence "Z%u", 8, 23),
+ X19, X20, X21, X22, X23, X24,
+ X25, X26, X27, X28, FP, LR)>;
+
// SME ABI support routines such as __arm_tpidr2_save/restore preserve most registers.
def CSR_AArch64_SME_ABI_Support_Routines_PreserveMost_From_X0
: CalleeSavedRegs<(add (sequence "Z%u", 0, 31),
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 78ac57e3e92a6..6b7e494b2c59b 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -1200,7 +1200,25 @@ static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI,
switch (Opc) {
default:
- llvm_unreachable("No SEH Opcode for this instruction");
+ report_fatal_error("No SEH Opcode for this instruction");
+ case AArch64::STR_ZXI:
+ case AArch64::LDR_ZXI: {
+ unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
+ MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
+ .addImm(Reg0)
+ .addImm(Imm)
+ .setMIFlag(Flag);
+ break;
+ }
+ case AArch64::STR_PXI:
+ case AArch64::LDR_PXI: {
+ unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
+ MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
+ .addImm(Reg0)
+ .addImm(Imm)
+ .setMIFlag(Flag);
+ break;
+ }
case AArch64::LDPDpost:
Imm = -Imm;
[[fallthrough]];
@@ -1592,6 +1610,9 @@ static bool IsSVECalleeSave(MachineBasicBlock::iterator I) {
case AArch64::CMPNE_PPzZI_B:
return I->getFlag(MachineInstr::FrameSetup) ||
I->getFlag(MachineInstr::FrameDestroy);
+ case AArch64::SEH_SavePReg:
+ case AArch64::SEH_SaveZReg:
+ return true;
}
}
@@ -1874,12 +1895,48 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
+ // Windows unwind can't represent the required stack adjustments if we have
+ // both SVE callee-saves and dynamic stack allocations, and the frame
+ // pointer is before the SVE spills. The allocation of the frame pointer
+ // must be the last instruction in the prologue so the unwinder can restore
+ // the stack pointer correctly. (And there isn't any unwind opcode for
+ // `addvl sp, x29, -17`.)
+ //
+ // Because of this, we do spills in the opposite order on Windows: first SVE,
+ // then GPRs. The main side-effect of this is that it makes accessing
+ // parameters passed on the stack more expensive.
+ //
+ // We could consider rearranging the spills for simpler cases.
+ bool FPAfterSVECalleeSaves =
+ Subtarget.isTargetWindows() && AFI->getSVECalleeSavedStackSize();
+
auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
// All of the remaining stack allocations are for locals.
AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
bool HomPrologEpilog = homogeneousPrologEpilog(MF);
- if (CombineSPBump) {
+ if (FPAfterSVECalleeSaves) {
+ // If we're doing SVE saves first, we need to immediately allocate space
+ // for fixed objects, then space for the SVE callee saves.
+ //
+ // Windows unwind requires that the scalable size is a multiple of 16;
+ // that's handled when the callee-saved size is computed.
+ auto SaveSize =
+ StackOffset::getScalable(AFI->getSVECalleeSavedStackSize()) +
+ StackOffset::getFixed(FixedObject);
+ allocateStackSpace(MBB, MBBI, 0, SaveSize, NeedsWinCFI, &HasWinCFI,
+ /*EmitCFI=*/false, StackOffset{},
+ /*FollowupAllocs=*/true);
+ NumBytes -= FixedObject;
+
+ // Now allocate space for the GPR callee saves.
+ while (MBBI != End && IsSVECalleeSave(MBBI))
+ ++MBBI;
+ MBBI = convertCalleeSaveRestoreToSPPrePostIncDec(
+ MBB, MBBI, DL, TII, -AFI->getCalleeSavedStackSize(), NeedsWinCFI,
+ &HasWinCFI, EmitAsyncCFI);
+ NumBytes -= AFI->getCalleeSavedStackSize();
+ } else if (CombineSPBump) {
assert(!SVEStackSize && "Cannot combine SP bump with SVE");
emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
StackOffset::getFixed(-NumBytes), TII,
@@ -1982,6 +2039,8 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
: 0;
if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
+ if (AFI->getSVECalleeSavedStackSize())
+ report_fatal_error("SVE callee saves not yet supported");
uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
if (NeedsWinCFI) {
HasWinCFI = true;
@@ -2116,9 +2175,11 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
<< "\n");
// Find callee save instructions in frame.
CalleeSavesBegin = MBBI;
- assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
- while (IsSVECalleeSave(MBBI) && MBBI != MBB.getFirstTerminator())
- ++MBBI;
+ if (!FPAfterSVECalleeSaves) {
+ assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
+ while (IsSVECalleeSave(MBBI) && MBBI != MBB.getFirstTerminator())
+ ++MBBI;
+ }
CalleeSavesEnd = MBBI;
SVECalleeSavesSize = StackOffset::getScalable(CalleeSavedSize);
@@ -2129,9 +2190,11 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
StackOffset CFAOffset =
StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes);
StackOffset LocalsSize = SVELocalsSize + StackOffset::getFixed(NumBytes);
- allocateStackSpace(MBB, CalleeSavesBegin, 0, SVECalleeSavesSize, false,
- nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
- MFI.hasVarSizedObjects() || LocalsSize);
+ if (!FPAfterSVECalleeSaves) {
+ allocateStackSpace(MBB, CalleeSavesBegin, 0, SVECalleeSavesSize, false,
+ nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
+ MFI.hasVarSizedObjects() || LocalsSize);
+ }
CFAOffset += SVECalleeSavesSize;
if (EmitAsyncCFI)
@@ -2303,10 +2366,16 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
assert(AfterCSRPopSize == 0);
return;
}
+
+ bool FPAfterSVECalleeSaves =
+ Subtarget.isTargetWindows() && AFI->getSVECalleeSavedStackSize();
+
bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
// Assume we can't combine the last pop with the sp restore.
bool CombineAfterCSRBump = false;
- if (!CombineSPBump && PrologueSaveSize != 0) {
+ if (FPAfterSVECalleeSaves) {
+ AfterCSRPopSize = FixedObject;
+ } else if (!CombineSPBump && PrologueSaveSize != 0) {
MachineBasicBlock::iterator Pop = std::prev(MBB.getFirstTerminator());
while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
AArch64InstrInfo::isSEHInstruction(*Pop))
@@ -2339,7 +2408,7 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
while (LastPopI != Begin) {
--LastPopI;
if (!LastPopI->getFlag(MachineInstr::FrameDestroy) ||
- IsSVECalleeSave(LastPopI)) {
+ (!FPAfterSVECalleeSaves && IsSVECalleeSave(LastPopI))) {
++LastPopI;
break;
} else if (CombineSPBump)
@@ -2415,6 +2484,9 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
+ if (FPAfterSVECalleeSaves)
+ RestoreEnd = MBB.getFirstTerminator();
+
RestoreBegin = std::prev(RestoreEnd);
while (RestoreBegin != MBB.begin() &&
IsSVECalleeSave(std::prev(RestoreBegin)))
@@ -2430,7 +2502,31 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
}
// Deallocate the SVE area.
- if (SVEStackSize) {
+ if (FPAfterSVECalleeSaves) {
+ // If the callee-save area is before FP, restoring the FP implicitly
+ // deallocates non-callee-save SVE allocations. Otherwise, deallocate
+ // them explicitly.
+ if (!AFI->isStackRealigned() && !MFI.hasVarSizedObjects()) {
+ emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
+ DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
+ NeedsWinCFI, &HasWinCFI);
+ }
+
+ // Deallocate callee-save non-SVE registers.
+ emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
+ StackOffset::getFixed(AFI->getCalleeSavedStackSize()), TII,
+ MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
+
+ // Deallocate fixed objects.
+ emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
+ StackOffset::getFixed(FixedObject), TII,
+ MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
+
+ // Deallocate callee-save SVE registers.
+ emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
+ DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
+ NeedsWinCFI, &HasWinCFI);
+ } else if (SVEStackSize) {
// If we have stack realignment or variable sized objects on the stack,
// restore the stack pointer from the frame pointer prior to SVE CSR
// restoration.
@@ -2450,20 +2546,20 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
emitFrameOffset(
MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
StackOffset::getFixed(NumBytes), TII, MachineInstr::FrameDestroy,
- false, false, nullptr, EmitCFI && !hasFP(MF),
+ false, NeedsWinCFI, &HasWinCFI, EmitCFI && !hasFP(MF),
SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
NumBytes = 0;
}
emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
- false, nullptr, EmitCFI && !hasFP(MF),
+ NeedsWinCFI, &HasWinCFI, EmitCFI && !hasFP(MF),
SVEStackSize +
StackOffset::getFixed(NumBytes + PrologueSaveSize));
emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
- false, nullptr, EmitCFI && !hasFP(MF),
+ NeedsWinCFI, &HasWinCFI, EmitCFI && !hasFP(MF),
DeallocateAfter +
StackOffset::getFixed(NumBytes + PrologueSaveSize));
}
@@ -2757,10 +2853,27 @@ StackOffset AArch64FrameLowering::resolveFrameOffsetReference(
}
StackOffset ScalableOffset = {};
- if (UseFP && !(isFixed || isCSR))
- ScalableOffset = -SVEStackSize;
- if (!UseFP && (isFixed || isCSR))
- ScalableOffset = SVEStackSize;
+ bool FPAfterSVECalleeSaves =
+ isTargetWindows(MF) && AFI->getSVECalleeSavedStackSize();
+ if (FPAfterSVECalleeSaves) {
+ // In this stack layout, the FP is in between the callee saves and other
+ // SVE allocations.
+ StackOffset SVECalleeSavedStack =
+ StackOffset::getScalable(AFI->getSVECalleeSavedStackSize());
+ if (UseFP) {
+ if (!(isFixed || isCSR))
+ ScalableOffset = SVECalleeSavedStack - SVEStackSize;
+ else
+ ScalableOffset = SVECalleeSavedStack;
+ } else if (!UseFP && (isFixed || isCSR)) {
+ ScalableOffset = SVEStackSize;
+ }
+ } else {
+ if (UseFP && !(isFixed || isCSR))
+ ScalableOffset = -SVEStackSize;
+ if (!UseFP && (isFixed || isCSR))
+ ScalableOffset = SVEStackSize;
+ }
if (UseFP) {
FrameReg = RegInfo->getFrameRegister(MF);
@@ -2934,7 +3047,9 @@ static void computeCalleeSaveRegisterPairs(
RegInc = -1;
FirstReg = Count - 1;
}
- int ScalableByteOffset = AFI->getSVECalleeSavedStackSize();
+ bool FPAfterSVECalleeSaves = IsWindows && AFI->getSVECalleeSavedStackSize();
+ int ScalableByteOffset =
+ FPAfterSVECalleeSaves ? 0 : AFI->getSVECalleeSavedStackSize();
bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
Register LastReg = 0;
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 1a13adc300d2b..c1ac18fb09180 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -1176,6 +1176,9 @@ bool AArch64InstrInfo::isSEHInstruction(const MachineInstr &MI) {
case AArch64::SEH_PACSignLR:
case AArch64::SEH_SaveAnyRegQP:
case AArch64::SEH_SaveAnyRegQPX:
+ case AArch64::SEH_AllocZ:
+ case AArch64::SEH_SaveZReg:
+ case AArch64::SEH_SavePReg:
return true;
}
}
@@ -5988,10 +5991,16 @@ static void emitFrameOffsetAdj(MachineBasicBlock &MBB,
}
if (NeedsWinCFI) {
- assert(Sign == 1 && "SEH directives should always have a positive sign");
int Imm = (int)(ThisVal << LocalShiftSize);
- if ((DestReg == AArch64::FP && SrcReg == AArch64::SP) ||
- (SrcReg == AArch64::FP && DestReg == AArch64::SP)) {
+ if (VScale != 1 && DestReg == AArch64::SP) {
+ if (HasWinCFI)
+ *HasWinCFI = true;
+ BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_AllocZ))
+ .addImm(ThisVal)
+ .setMIFlag(Flag);
+ } else if ((DestReg == AArch64::FP && SrcReg == AArch64::SP) ||
+ (SrcReg == AArch64::FP && DestReg == AArch64::SP)) {
+ assert(VScale == 1 && "Expected non-scalable operation");
if (HasWinCFI)
*HasWinCFI = true;
if (Imm == 0)
@@ -6003,6 +6012,7 @@ static void emitFrameOffsetAdj(MachineBasicBlock &MBB,
assert(Offset == 0 && "Expected remaining offset to be zero to "
"emit a single SEH directive");
} else if (DestReg == AArch64::SP) {
+ assert(VScale == 1 && "Expected non-scalable operation");
if (HasWinCFI)
*HasWinCFI = true;
assert(SrcReg == AArch64::SP && "Unexpected SrcReg for SEH_StackAlloc");
@@ -6057,14 +6067,14 @@ void llvm::emitFrameOffset(MachineBasicBlock &MBB,
assert(!(SetNZCV && (NumPredicateVectors || NumDataVectors)) &&
"SetNZCV not supported with SVE vectors");
- assert(!(NeedsWinCFI && (NumPredicateVectors || NumDataVectors)) &&
- "WinCFI not supported with SVE vectors");
+ assert(!(NeedsWinCFI && NumPredicateVectors) &&
+ "WinCFI can't allocate fractions of an SVE data vector");
if (NumDataVectors) {
emitFrameOffsetAdj(MBB, MBBI, DL, DestReg, SrcReg, NumDataVectors,
- UseSVL ? AArch64::ADDSVL_XXI : AArch64::ADDVL_XXI,
- TII, Flag, NeedsWinCFI, nullptr, EmitCFAOffset,
- CFAOffset, FrameReg);
+ UseSVL ? AArch64::ADDSVL_XXI : AArch64::ADDVL_XXI, TII,
+ Flag, NeedsWinCFI, HasWinCFI, EmitCFAOffset, CFAOffset,
+ FrameReg);
CFAOffset += StackOffset::getScalable(-NumDataVectors * 16);
SrcReg = DestReg;
}
@@ -6072,9 +6082,9 @@ void llvm::emitFrameOffset(MachineBasicBlock &MBB,
if (NumPredicateVectors) {
assert(DestReg != AArch64::SP && "Unaligned access to SP");
emitFrameOffsetAdj(MBB, MBBI, DL, DestReg, SrcReg, NumPredicateVectors,
- UseSVL ? AArch64::ADDSPL_XXI : AArch64::ADDPL_XXI,
- TII, Flag, NeedsWinCFI, nullptr, EmitCFAOffset,
- CFAOffset, FrameReg);
+ UseSVL ? AArch64::ADDSPL_XXI : AArch64::ADDPL_XXI, TII,
+ Flag, NeedsWinCFI, HasWinCFI, EmitCFAOffset, CFAOffset,
+ FrameReg);
}
}
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 3962c7eba5833..6165a1ac3e079 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -5408,6 +5408,9 @@ let isPseudo = 1 in {
def SEH_PACSignLR : Pseudo<(outs), (ins), []>, Sched<[]>;
def SEH_SaveAnyRegQP : Pseudo<(outs), (ins i32imm:$reg0, i32imm:$reg1, i32imm:$offs), []>, Sched<[]>;
def SEH_SaveAnyRegQPX : Pseudo<(outs), (ins i32imm:$reg0, i32imm:$reg1, i32imm:$offs), []>, Sched<[]>;
+ def SEH_AllocZ : Pseudo<(outs), (ins i32imm:$offs), []>, Sched<[]>;
+ def SEH_SaveZReg : Pseudo<(outs), (ins i32imm:$reg, i32imm:$offs), []>, Sched<[]>;
+ def SEH_SavePReg : Pseudo<(outs), (ins i32imm:$reg, i32imm:$offs), []>, Sched<[]>;
}
// Pseudo instructions for Windows EH
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index 52b362875b4ef..e9a1b558b2dfe 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -98,6 +98,13 @@ AArch64RegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
return CSR_Win_AArch64_AAPCS_SwiftError_SaveList;
if (MF->getFunction().getCallingConv() == CallingConv::SwiftTail)
return CSR_Win_AArch64_AAPCS_SwiftTail_SaveList;
+ if (MF->getFunction(...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good overall, thanks! A couple minor inline comments.
assert(!(NeedsWinCFI && (NumPredicateVectors || NumDataVectors)) && | ||
"WinCFI not supported with SVE vectors"); | ||
assert(!(NeedsWinCFI && NumPredicateVectors) && | ||
"WinCFI can't allocate fractions of an SVE data vector"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to this limitation, the change below on lines 6085-6087 for passing HasWinCFI
instead of nullptr
is essentially impossible to ever be used, right? But it's of course good to pass HasWinCFI
in both cases for consistency anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's not clear what/where is being done, so that when we have one or more predicate vectors to back up, we allocate space for them in the form of full data vectors. For the general calling convention, all of p4-p11 are backed up I presume, but what if we'd be clobbering only one of them in a function where we don't need to back them all up? How is that handled on e.g. Linux today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SVE stack size is always rounded up to a 16-byte multiple (implicitly multiplied by vscale), so we always allocate some number of full SVE vectors (which also ensures the stack stays 16-byte aligned). I don't think predicate-sized allocations can occur in frame lowering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right... What I'm asking is, is it possible to hit this assert? Looking at decomposeStackOffsetForFrameOffsets
it does seem like it would be possible to end up with a nonzero NumPredicateVectors
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In decomposeStackOffsetForFrameOffsets()
if NumPredicateVectors % 8 == 0
(which will be the case if the allocation has been aligned to 16-bytes), it'll change the allocation to be NumDataVectors = NumPredicateVectors / 8
(and no predicate vectors). So I don't think this can be hit from frame lowering. I think it may be used for some other stack allocations (but I think NeedsWinCFI
is false for those).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I see - thanks, that answers my question!
; CHECK-NEXT: .seh_allocz 17 | ||
; CHECK-NEXT: .seh_endepilogue | ||
; CHECK-NEXT: ret | ||
i64 %n5, i64 %n6, i64 %n7, i64 %n8, i64 %n9) personality ptr @__CxxFrameHandler3 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This IR line has been split with the CHECK lines in the middle of the line - this is quite confusing. Is it possible to join the line without the update script turning it back into this form again?
; CHECK-NEXT: .seh_endproc | ||
call void asm "", "~{d8}"() | ||
ret void | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a testcase with variable arguments (forcing space for homed arguments on the stack) together with SVE registers? If I understood some code comment correctly, this is taken into account somehow, but it would be good to cover it with tests.
@@ -1982,6 +2039,8 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF, | |||
: 0; | |||
|
|||
if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) { | |||
if (AFI->getSVECalleeSavedStackSize()) | |||
report_fatal_error("SVE callee saves not yet supported"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the error message a little more descriptive here? SVE callee saves are supported, but not in conjunction with stack probes, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few initial comments:
ScalableOffset = -SVEStackSize; | ||
if (!UseFP && (isFixed || isCSR)) | ||
ScalableOffset = SVEStackSize; | ||
bool FPAfterSVECalleeSaves = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getFrameIndexReferenceFromSP
also needs to be updated to handle this layout (currently -pass-remarks-analysis=stack-frame-layout
reports the normal layout).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think resolveFrameOffsetReference
will also be incorrect for SVE CSs, not sure if that's an issue, but I think there should at least be an assertion.
ScalableOffset = -SVEStackSize; | ||
if (!UseFP && (isFixed || isCSR)) | ||
ScalableOffset = SVEStackSize; | ||
bool FPAfterSVECalleeSaves = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With "frame-pointer"="all"
, I think the addresses of SVE locals are incorrect. For example, the f4
test function resolves the offset as:
sub x1, x29, #8
addvl x1, x1, #-18
Which is subtracting the size of the SVE callee-saves that are above the FP.
Most bits of this are straightforward: when we emit SVE instructions in the prologue/epilogue, emit corresponding opcodes.
The unfortunately nasty bit is the handling of the frame pointer in functions that use the SVE calling convention. If we have SVE callee saves, and need to restore the stack pointer from the frame pointer, it's impossible to encode callee saves that happen after the frame pointer. So this patch rearranges the stack to put SVE callee saves first. This isn't really that complicated on its own, but it leads to a lot of tricky conditionals (see FPAfterSVECalleeSaves).