Thanks to visit codestin.com
Credit goes to llvm.org

LLVM 22.0.0git
AMDGPURegisterBankInfo.cpp
Go to the documentation of this file.
1//===- AMDGPURegisterBankInfo.cpp -------------------------------*- C++ -*-==//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8/// \file
9/// This file implements the targeting of the RegisterBankInfo class for
10/// AMDGPU.
11///
12/// \par
13///
14/// AMDGPU has unique register bank constraints that require special high level
15/// strategies to deal with. There are two main true physical register banks
16/// VGPR (vector), and SGPR (scalar). Additionally the VCC register bank is a
17/// sort of pseudo-register bank needed to represent SGPRs used in a vector
18/// boolean context. There is also the AGPR bank, which is a special purpose
19/// physical register bank present on some subtargets.
20///
21/// Copying from VGPR to SGPR is generally illegal, unless the value is known to
22/// be uniform. It is generally not valid to legalize operands by inserting
23/// copies as on other targets. Operations which require uniform, SGPR operands
24/// generally require scalarization by repeatedly executing the instruction,
25/// activating each set of lanes using a unique set of input values. This is
26/// referred to as a waterfall loop.
27///
28/// \par Booleans
29///
30/// Booleans (s1 values) requires special consideration. A vector compare result
31/// is naturally a bitmask with one bit per lane, in a 32 or 64-bit
32/// register. These are represented with the VCC bank. During selection, we need
33/// to be able to unambiguously go back from a register class to a register
34/// bank. To distinguish whether an SGPR should use the SGPR or VCC register
35/// bank, we need to know the use context type. An SGPR s1 value always means a
36/// VCC bank value, otherwise it will be the SGPR bank. A scalar compare sets
37/// SCC, which is a 1-bit unaddressable register. This will need to be copied to
38/// a 32-bit virtual register. Taken together, this means we need to adjust the
39/// type of boolean operations to be regbank legal. All SALU booleans need to be
40/// widened to 32-bits, and all VALU booleans need to be s1 values.
41///
42/// A noteworthy exception to the s1-means-vcc rule is for legalization artifact
43/// casts. G_TRUNC s1 results, and G_SEXT/G_ZEXT/G_ANYEXT sources are never vcc
44/// bank. A non-boolean source (such as a truncate from a 1-bit load from
45/// memory) will require a copy to the VCC bank which will require clearing the
46/// high bits and inserting a compare.
47///
48/// \par Constant bus restriction
49///
50/// VALU instructions have a limitation known as the constant bus
51/// restriction. Most VALU instructions can use SGPR operands, but may read at
52/// most 1 SGPR or constant literal value (this to 2 in gfx10 for most
53/// instructions). This is one unique SGPR, so the same SGPR may be used for
54/// multiple operands. From a register bank perspective, any combination of
55/// operands should be legal as an SGPR, but this is contextually dependent on
56/// the SGPR operands all being the same register. There is therefore optimal to
57/// choose the SGPR with the most uses to minimize the number of copies.
58///
59/// We avoid trying to solve this problem in RegBankSelect. Any VALU G_*
60/// operation should have its source operands all mapped to VGPRs (except for
61/// VCC), inserting copies from any SGPR operands. This the most trivial legal
62/// mapping. Anything beyond the simplest 1:1 instruction selection would be too
63/// complicated to solve here. Every optimization pattern or instruction
64/// selected to multiple outputs would have to enforce this rule, and there
65/// would be additional complexity in tracking this rule for every G_*
66/// operation. By forcing all inputs to VGPRs, it also simplifies the task of
67/// picking the optimal operand combination from a post-isel optimization pass.
68///
69//===----------------------------------------------------------------------===//
70
72
73#include "AMDGPU.h"
75#include "AMDGPUInstrInfo.h"
76#include "AMDGPULaneMaskUtils.h"
77#include "GCNSubtarget.h"
79#include "SIRegisterInfo.h"
85#include "llvm/IR/IntrinsicsAMDGPU.h"
86
87#define GET_TARGET_REGBANK_IMPL
88#include "AMDGPUGenRegisterBank.inc"
89
90// This file will be TableGen'ed at some point.
91#include "AMDGPUGenRegisterBankInfo.def"
92
93using namespace llvm;
94using namespace MIPatternMatch;
95
96namespace {
97
98// Observer to apply a register bank to new registers created by LegalizerHelper.
99class ApplyRegBankMapping final : public GISelChangeObserver {
100private:
102 const AMDGPURegisterBankInfo &RBI;
104 const RegisterBank *NewBank;
106
107public:
108 ApplyRegBankMapping(MachineIRBuilder &B, const AMDGPURegisterBankInfo &RBI_,
109 MachineRegisterInfo &MRI_, const RegisterBank *RB)
110 : B(B), RBI(RBI_), MRI(MRI_), NewBank(RB) {
111 assert(!B.isObservingChanges());
112 B.setChangeObserver(*this);
113 }
114
115 ~ApplyRegBankMapping() override {
116 for (MachineInstr *MI : NewInsts)
117 applyBank(*MI);
118
119 B.stopObservingChanges();
120 }
121
122 /// Set any registers that don't have a set register class or bank to SALU.
123 void applyBank(MachineInstr &MI) {
124 const unsigned Opc = MI.getOpcode();
125 if (Opc == AMDGPU::G_ANYEXT || Opc == AMDGPU::G_ZEXT ||
126 Opc == AMDGPU::G_SEXT) {
127 // LegalizerHelper wants to use the basic legalization artifacts when
128 // widening etc. We don't handle selection with vcc in artifact sources,
129 // so we need to use a select instead to handle these properly.
130 Register DstReg = MI.getOperand(0).getReg();
131 Register SrcReg = MI.getOperand(1).getReg();
132 const RegisterBank *SrcBank = RBI.getRegBank(SrcReg, MRI, *RBI.TRI);
133 if (SrcBank == &AMDGPU::VCCRegBank) {
134 const LLT S32 = LLT::scalar(32);
135 assert(MRI.getType(SrcReg) == LLT::scalar(1));
136 assert(MRI.getType(DstReg) == S32);
137 assert(NewBank == &AMDGPU::VGPRRegBank);
138
139 // Replace the extension with a select, which really uses the boolean
140 // source.
141 B.setInsertPt(*MI.getParent(), MI);
142
143 auto True = B.buildConstant(S32, Opc == AMDGPU::G_SEXT ? -1 : 1);
144 auto False = B.buildConstant(S32, 0);
145 B.buildSelect(DstReg, SrcReg, True, False);
146 MRI.setRegBank(True.getReg(0), *NewBank);
147 MRI.setRegBank(False.getReg(0), *NewBank);
148 MI.eraseFromParent();
149 }
150
151 assert(!MRI.getRegClassOrRegBank(DstReg));
152 MRI.setRegBank(DstReg, *NewBank);
153 return;
154 }
155
156#ifndef NDEBUG
157 if (Opc == AMDGPU::G_TRUNC) {
158 Register DstReg = MI.getOperand(0).getReg();
159 const RegisterBank *DstBank = RBI.getRegBank(DstReg, MRI, *RBI.TRI);
160 assert(DstBank != &AMDGPU::VCCRegBank);
161 }
162#endif
163
164 for (MachineOperand &Op : MI.operands()) {
165 if (!Op.isReg())
166 continue;
167
168 // We may see physical registers if building a real MI
169 Register Reg = Op.getReg();
170 if (Reg.isPhysical() || MRI.getRegClassOrRegBank(Reg))
171 continue;
172
173 const RegisterBank *RB = NewBank;
174 if (MRI.getType(Reg) == LLT::scalar(1)) {
175 assert(NewBank == &AMDGPU::VGPRRegBank &&
176 "s1 operands should only be used for vector bools");
177 assert((MI.getOpcode() != AMDGPU::G_TRUNC &&
178 MI.getOpcode() != AMDGPU::G_ANYEXT) &&
179 "not expecting legalization artifacts here");
180 RB = &AMDGPU::VCCRegBank;
181 }
182
183 MRI.setRegBank(Reg, *RB);
184 }
185 }
186
187 void erasingInstr(MachineInstr &MI) override {}
188
189 void createdInstr(MachineInstr &MI) override {
190 // At this point, the instruction was just inserted and has no operands.
191 NewInsts.push_back(&MI);
192 }
193
194 void changingInstr(MachineInstr &MI) override {}
195 void changedInstr(MachineInstr &MI) override {
196 // FIXME: In principle we should probably add the instruction to NewInsts,
197 // but the way the LegalizerHelper uses the observer, we will always see the
198 // registers we need to set the regbank on also referenced in a new
199 // instruction.
200 }
201};
202
203} // anonymous namespace
204
206 : Subtarget(ST), TRI(Subtarget.getRegisterInfo()),
207 TII(Subtarget.getInstrInfo()) {
208
209 // HACK: Until this is fully tablegen'd.
210 static llvm::once_flag InitializeRegisterBankFlag;
211
212 static auto InitializeRegisterBankOnce = [this]() {
213 assert(&getRegBank(AMDGPU::SGPRRegBankID) == &AMDGPU::SGPRRegBank &&
214 &getRegBank(AMDGPU::VGPRRegBankID) == &AMDGPU::VGPRRegBank &&
215 &getRegBank(AMDGPU::AGPRRegBankID) == &AMDGPU::AGPRRegBank);
216 (void)this;
217 };
218
219 llvm::call_once(InitializeRegisterBankFlag, InitializeRegisterBankOnce);
220}
221
222static bool isVectorRegisterBank(const RegisterBank &Bank) {
223 unsigned BankID = Bank.getID();
224 return BankID == AMDGPU::VGPRRegBankID || BankID == AMDGPU::AGPRRegBankID;
225}
226
228 return RB != &AMDGPU::SGPRRegBank;
229}
230
232 const RegisterBank &Src,
233 TypeSize Size) const {
234 // TODO: Should there be a UniformVGPRRegBank which can use readfirstlane?
235 if (Dst.getID() == AMDGPU::SGPRRegBankID &&
236 (isVectorRegisterBank(Src) || Src.getID() == AMDGPU::VCCRegBankID)) {
237 return std::numeric_limits<unsigned>::max();
238 }
239
240 // Bool values are tricky, because the meaning is based on context. The SCC
241 // and VCC banks are for the natural scalar and vector conditions produced by
242 // a compare.
243 //
244 // Legalization doesn't know about the necessary context, so an s1 use may
245 // have been a truncate from an arbitrary value, in which case a copy (lowered
246 // as a compare with 0) needs to be inserted.
247 if (Size == 1 &&
248 (Dst.getID() == AMDGPU::SGPRRegBankID) &&
249 (isVectorRegisterBank(Src) ||
250 Src.getID() == AMDGPU::SGPRRegBankID ||
251 Src.getID() == AMDGPU::VCCRegBankID))
252 return std::numeric_limits<unsigned>::max();
253
254 // There is no direct copy between AGPRs.
255 if (Dst.getID() == AMDGPU::AGPRRegBankID &&
256 Src.getID() == AMDGPU::AGPRRegBankID)
257 return 4;
258
259 return RegisterBankInfo::copyCost(Dst, Src, Size);
260}
261
263 const ValueMapping &ValMapping,
264 const RegisterBank *CurBank) const {
265 // Check if this is a breakdown for G_LOAD to move the pointer from SGPR to
266 // VGPR.
267 // FIXME: Is there a better way to do this?
268 if (ValMapping.NumBreakDowns >= 2 || ValMapping.BreakDown[0].Length >= 64)
269 return 10; // This is expensive.
270
271 assert(ValMapping.NumBreakDowns == 2 &&
272 ValMapping.BreakDown[0].Length == 32 &&
273 ValMapping.BreakDown[0].StartIdx == 0 &&
274 ValMapping.BreakDown[1].Length == 32 &&
275 ValMapping.BreakDown[1].StartIdx == 32 &&
276 ValMapping.BreakDown[0].RegBank == ValMapping.BreakDown[1].RegBank);
277
278 // 32-bit extract of a 64-bit value is just access of a subregister, so free.
279 // TODO: Cost of 0 hits assert, though it's not clear it's what we really
280 // want.
281
282 // TODO: 32-bit insert to a 64-bit SGPR may incur a non-free copy due to SGPR
283 // alignment restrictions, but this probably isn't important.
284 return 1;
285}
286
287const RegisterBank &
289 LLT Ty) const {
290 if (&RC == &AMDGPU::SReg_1RegClass)
291 return AMDGPU::VCCRegBank;
292
293 // We promote real scalar booleans to SReg_32. Any SGPR using s1 is really a
294 // VCC-like use.
295 if (TRI->isSGPRClass(&RC)) {
296 // FIXME: This probably came from a copy from a physical register, which
297 // should be inferable from the copied to-type. We don't have many boolean
298 // physical register constraints so just assume a normal SGPR for now.
299 if (!Ty.isValid())
300 return AMDGPU::SGPRRegBank;
301
302 return Ty == LLT::scalar(1) ? AMDGPU::VCCRegBank : AMDGPU::SGPRRegBank;
303 }
304
305 return TRI->isAGPRClass(&RC) ? AMDGPU::AGPRRegBank : AMDGPU::VGPRRegBank;
306}
307
308template <unsigned NumOps>
311 const MachineInstr &MI, const MachineRegisterInfo &MRI,
312 const std::array<unsigned, NumOps> RegSrcOpIdx,
313 ArrayRef<OpRegBankEntry<NumOps>> Table) const {
314
315 InstructionMappings AltMappings;
316
318
319 unsigned Sizes[NumOps];
320 for (unsigned I = 0; I < NumOps; ++I) {
321 Register Reg = MI.getOperand(RegSrcOpIdx[I]).getReg();
322 Sizes[I] = getSizeInBits(Reg, MRI, *TRI);
323 }
324
325 for (unsigned I = 0, E = MI.getNumExplicitDefs(); I != E; ++I) {
326 unsigned SizeI = getSizeInBits(MI.getOperand(I).getReg(), MRI, *TRI);
327 Operands[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SizeI);
328 }
329
330 // getInstrMapping's default mapping uses ID 1, so start at 2.
331 unsigned MappingID = 2;
332 for (const auto &Entry : Table) {
333 for (unsigned I = 0; I < NumOps; ++I) {
334 int OpIdx = RegSrcOpIdx[I];
335 Operands[OpIdx] = AMDGPU::getValueMapping(Entry.RegBanks[I], Sizes[I]);
336 }
337
338 AltMappings.push_back(&getInstructionMapping(MappingID++, Entry.Cost,
340 Operands.size()));
341 }
342
343 return AltMappings;
344}
345
348 const MachineInstr &MI, const MachineRegisterInfo &MRI) const {
350 case Intrinsic::amdgcn_readlane: {
351 static const OpRegBankEntry<3> Table[2] = {
352 // Perfectly legal.
353 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID }, 1 },
354
355 // Need a readfirstlane for the index.
356 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 }
357 };
358
359 const std::array<unsigned, 3> RegSrcOpIdx = { { 0, 2, 3 } };
360 return addMappingFromTable<3>(MI, MRI, RegSrcOpIdx, Table);
361 }
362 case Intrinsic::amdgcn_writelane: {
363 static const OpRegBankEntry<4> Table[4] = {
364 // Perfectly legal.
365 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
366
367 // Need readfirstlane of first op
368 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 },
369
370 // Need readfirstlane of second op
371 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 },
372
373 // Need readfirstlane of both ops
374 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 3 }
375 };
376
377 // rsrc, voffset, offset
378 const std::array<unsigned, 4> RegSrcOpIdx = { { 0, 2, 3, 4 } };
379 return addMappingFromTable<4>(MI, MRI, RegSrcOpIdx, Table);
380 }
381 default:
383 }
384}
385
388 const MachineInstr &MI, const MachineRegisterInfo &MRI) const {
389
391 case Intrinsic::amdgcn_s_buffer_load: {
392 static const OpRegBankEntry<2> Table[4] = {
393 // Perfectly legal.
394 { { AMDGPU::SGPRRegBankID, AMDGPU::SGPRRegBankID }, 1 },
395
396 // Only need 1 register in loop
397 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 300 },
398
399 // Have to waterfall the resource.
400 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID }, 1000 },
401
402 // Have to waterfall the resource, and the offset.
403 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 1500 }
404 };
405
406 // rsrc, offset
407 const std::array<unsigned, 2> RegSrcOpIdx = { { 2, 3 } };
408 return addMappingFromTable<2>(MI, MRI, RegSrcOpIdx, Table);
409 }
410 case Intrinsic::amdgcn_ds_ordered_add:
411 case Intrinsic::amdgcn_ds_ordered_swap: {
412 // VGPR = M0, VGPR
413 static const OpRegBankEntry<3> Table[2] = {
414 // Perfectly legal.
415 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
416
417 // Need a readfirstlane for m0
418 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 }
419 };
420
421 const std::array<unsigned, 3> RegSrcOpIdx = { { 0, 2, 3 } };
422 return addMappingFromTable<3>(MI, MRI, RegSrcOpIdx, Table);
423 }
424 case Intrinsic::amdgcn_s_sendmsg:
425 case Intrinsic::amdgcn_s_sendmsghalt: {
426 // FIXME: Should have no register for immediate
427 static const OpRegBankEntry<1> Table[2] = {
428 // Perfectly legal.
429 { { AMDGPU::SGPRRegBankID }, 1 },
430
431 // Need readlane
432 { { AMDGPU::VGPRRegBankID }, 3 }
433 };
434
435 const std::array<unsigned, 1> RegSrcOpIdx = { { 2 } };
436 return addMappingFromTable<1>(MI, MRI, RegSrcOpIdx, Table);
437 }
438 default:
440 }
441}
442
443// FIXME: Returns uniform if there's no source value information. This is
444// probably wrong.
446 if (!MI.hasOneMemOperand())
447 return false;
448
449 const MachineMemOperand *MMO = *MI.memoperands_begin();
450 const unsigned AS = MMO->getAddrSpace();
451 const bool IsConst = AS == AMDGPUAS::CONSTANT_ADDRESS ||
453 const unsigned MemSize = 8 * MMO->getSize().getValue();
454
455 // Require 4-byte alignment.
456 return (MMO->getAlign() >= Align(4) ||
457 (Subtarget.hasScalarSubwordLoads() &&
458 ((MemSize == 16 && MMO->getAlign() >= Align(2)) ||
459 (MemSize == 8 && MMO->getAlign() >= Align(1))))) &&
460 // Can't do a scalar atomic load.
461 !MMO->isAtomic() &&
462 // Don't use scalar loads for volatile accesses to non-constant address
463 // spaces.
464 (IsConst || !MMO->isVolatile()) &&
465 // Memory must be known constant, or not written before this load.
466 (IsConst || MMO->isInvariant() || (MMO->getFlags() & MONoClobber)) &&
468}
469
472 const MachineInstr &MI) const {
473
474 const MachineFunction &MF = *MI.getParent()->getParent();
475 const MachineRegisterInfo &MRI = MF.getRegInfo();
476
477
478 InstructionMappings AltMappings;
479 switch (MI.getOpcode()) {
480 case TargetOpcode::G_CONSTANT:
481 case TargetOpcode::G_IMPLICIT_DEF: {
482 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
483 if (Size == 1) {
484 static const OpRegBankEntry<1> Table[3] = {
485 { { AMDGPU::VGPRRegBankID }, 1 },
486 { { AMDGPU::SGPRRegBankID }, 1 },
487 { { AMDGPU::VCCRegBankID }, 1 }
488 };
489
490 return addMappingFromTable<1>(MI, MRI, {{ 0 }}, Table);
491 }
492
493 [[fallthrough]];
494 }
495 case TargetOpcode::G_FCONSTANT:
496 case TargetOpcode::G_FRAME_INDEX:
497 case TargetOpcode::G_GLOBAL_VALUE: {
498 static const OpRegBankEntry<1> Table[2] = {
499 { { AMDGPU::VGPRRegBankID }, 1 },
500 { { AMDGPU::SGPRRegBankID }, 1 }
501 };
502
503 return addMappingFromTable<1>(MI, MRI, {{ 0 }}, Table);
504 }
505 case TargetOpcode::G_AND:
506 case TargetOpcode::G_OR:
507 case TargetOpcode::G_XOR: {
508 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
509
510 if (Size == 1) {
511 // s_{and|or|xor}_b32 set scc when the result of the 32-bit op is not 0.
512 const InstructionMapping &SCCMapping = getInstructionMapping(
513 1, 1, getOperandsMapping(
514 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32),
515 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32),
516 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32)}),
517 3); // Num Operands
518 AltMappings.push_back(&SCCMapping);
519
520 const InstructionMapping &VCCMapping0 = getInstructionMapping(
521 2, 1, getOperandsMapping(
522 {AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size),
523 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size),
524 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size)}),
525 3); // Num Operands
526 AltMappings.push_back(&VCCMapping0);
527 return AltMappings;
528 }
529
530 if (Size != 64)
531 break;
532
533 const InstructionMapping &SSMapping = getInstructionMapping(
534 1, 1, getOperandsMapping(
535 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
536 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
537 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size)}),
538 3); // Num Operands
539 AltMappings.push_back(&SSMapping);
540
541 const InstructionMapping &VVMapping = getInstructionMapping(
542 2, 2, getOperandsMapping(
543 {AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
544 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
545 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
546 3); // Num Operands
547 AltMappings.push_back(&VVMapping);
548 break;
549 }
550 case TargetOpcode::G_LOAD:
551 case TargetOpcode::G_ZEXTLOAD:
552 case TargetOpcode::G_SEXTLOAD: {
553 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
554 LLT PtrTy = MRI.getType(MI.getOperand(1).getReg());
555 unsigned PtrSize = PtrTy.getSizeInBits();
556 unsigned AS = PtrTy.getAddressSpace();
557
561 const InstructionMapping &SSMapping = getInstructionMapping(
562 1, 1, getOperandsMapping(
563 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
564 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, PtrSize)}),
565 2); // Num Operands
566 AltMappings.push_back(&SSMapping);
567 }
568
569 const InstructionMapping &VVMapping = getInstructionMapping(
570 2, 1,
572 {AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
573 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize)}),
574 2); // Num Operands
575 AltMappings.push_back(&VVMapping);
576
577 // It may be possible to have a vgpr = load sgpr mapping here, because
578 // the mubuf instructions support this kind of load, but probably for only
579 // gfx7 and older. However, the addressing mode matching in the instruction
580 // selector should be able to do a better job of detecting and selecting
581 // these kinds of loads from the vgpr = load vgpr mapping.
582
583 return AltMappings;
584
585 }
586 case TargetOpcode::G_SELECT: {
587 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
588 const InstructionMapping &SSMapping = getInstructionMapping(1, 1,
589 getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
590 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1),
591 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
592 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size)}),
593 4); // Num Operands
594 AltMappings.push_back(&SSMapping);
595
596 const InstructionMapping &VVMapping = getInstructionMapping(2, 1,
597 getOperandsMapping({AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
598 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
599 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
600 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
601 4); // Num Operands
602 AltMappings.push_back(&VVMapping);
603
604 return AltMappings;
605 }
606 case TargetOpcode::G_UADDE:
607 case TargetOpcode::G_USUBE:
608 case TargetOpcode::G_SADDE:
609 case TargetOpcode::G_SSUBE: {
610 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
611 const InstructionMapping &SSMapping = getInstructionMapping(1, 1,
613 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
614 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1),
615 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
616 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
617 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1)}),
618 5); // Num Operands
619 AltMappings.push_back(&SSMapping);
620
621 const InstructionMapping &VVMapping = getInstructionMapping(2, 1,
622 getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
623 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
624 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
625 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
626 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1)}),
627 5); // Num Operands
628 AltMappings.push_back(&VVMapping);
629 return AltMappings;
630 }
631 case AMDGPU::G_BRCOND: {
632 assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);
633
634 // TODO: Change type to 32 for scalar
636 1, 1, getOperandsMapping(
637 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1), nullptr}),
638 2); // Num Operands
639 AltMappings.push_back(&SMapping);
640
642 1, 1, getOperandsMapping(
643 {AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1), nullptr }),
644 2); // Num Operands
645 AltMappings.push_back(&VMapping);
646 return AltMappings;
647 }
648 case AMDGPU::G_INTRINSIC:
649 case AMDGPU::G_INTRINSIC_CONVERGENT:
651 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
652 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS:
654 default:
655 break;
656 }
658}
659
663 LLT HalfTy,
664 Register Reg) const {
665 assert(HalfTy.getSizeInBits() == 32);
666 MachineRegisterInfo *MRI = B.getMRI();
667 Register LoLHS = MRI->createGenericVirtualRegister(HalfTy);
668 Register HiLHS = MRI->createGenericVirtualRegister(HalfTy);
669 const RegisterBank *Bank = getRegBank(Reg, *MRI, *TRI);
670 MRI->setRegBank(LoLHS, *Bank);
671 MRI->setRegBank(HiLHS, *Bank);
672
673 Regs.push_back(LoLHS);
674 Regs.push_back(HiLHS);
675
676 B.buildInstr(AMDGPU::G_UNMERGE_VALUES)
677 .addDef(LoLHS)
678 .addDef(HiLHS)
679 .addUse(Reg);
680}
681
682/// Replace the current type each register in \p Regs has with \p NewTy
684 LLT NewTy) {
685 for (Register Reg : Regs) {
686 assert(MRI.getType(Reg).getSizeInBits() == NewTy.getSizeInBits());
687 MRI.setType(Reg, NewTy);
688 }
689}
690
692 if (Ty.isVector()) {
693 assert(Ty.getElementCount().isKnownMultipleOf(2));
694 return LLT::scalarOrVector(Ty.getElementCount().divideCoefficientBy(2),
695 Ty.getElementType());
696 }
697
698 assert(Ty.getScalarSizeInBits() % 2 == 0);
699 return LLT::scalar(Ty.getScalarSizeInBits() / 2);
700}
701
702// Build one or more V_READFIRSTLANE_B32 instructions to move the given vector
703// source value into a scalar register.
706 Register Src) const {
707 LLT Ty = MRI.getType(Src);
708 const RegisterBank *Bank = getRegBank(Src, MRI, *TRI);
709
710 if (Bank == &AMDGPU::SGPRRegBank)
711 return Src;
712
713 unsigned Bits = Ty.getSizeInBits();
714 assert(Bits % 32 == 0);
715
716 if (Bank != &AMDGPU::VGPRRegBank) {
717 // We need to copy from AGPR to VGPR
718 Src = B.buildCopy(Ty, Src).getReg(0);
719 MRI.setRegBank(Src, AMDGPU::VGPRRegBank);
720 }
721
722 LLT S32 = LLT::scalar(32);
723 unsigned NumParts = Bits / 32;
726
727 if (Bits == 32) {
728 SrcParts.push_back(Src);
729 } else {
730 auto Unmerge = B.buildUnmerge(S32, Src);
731 for (unsigned i = 0; i < NumParts; ++i)
732 SrcParts.push_back(Unmerge.getReg(i));
733 }
734
735 for (unsigned i = 0; i < NumParts; ++i) {
736 Register SrcPart = SrcParts[i];
737 Register DstPart = MRI.createVirtualRegister(&AMDGPU::SReg_32_XM0RegClass);
738 MRI.setType(DstPart, NumParts == 1 ? Ty : S32);
739
740 const TargetRegisterClass *Constrained =
741 constrainGenericRegister(SrcPart, AMDGPU::VGPR_32RegClass, MRI);
742 (void)Constrained;
743 assert(Constrained && "Failed to constrain readfirstlane src reg");
744
745 B.buildInstr(AMDGPU::V_READFIRSTLANE_B32, {DstPart}, {SrcPart});
746
747 DstParts.push_back(DstPart);
748 }
749
750 if (Bits == 32)
751 return DstParts[0];
752
753 Register Dst = B.buildMergeLikeInstr(Ty, DstParts).getReg(0);
754 MRI.setRegBank(Dst, AMDGPU::SGPRRegBank);
755 return Dst;
756}
757
758/// Legalize instruction \p MI where operands in \p OpIndices must be SGPRs. If
759/// any of the required SGPR operands are VGPRs, perform a waterfall loop to
760/// execute the instruction for each unique combination of values in all lanes
761/// in the wave. The block will be split such that rest of the instructions are
762/// moved to a new block.
763///
764/// Essentially performs this loop:
765//
766/// Save Execution Mask
767/// For (Lane : Wavefront) {
768/// Enable Lane, Disable all other lanes
769/// SGPR = read SGPR value for current lane from VGPR
770/// VGPRResult[Lane] = use_op SGPR
771/// }
772/// Restore Execution Mask
773///
774/// There is additional complexity to try for compare values to identify the
775/// unique values used.
778 SmallSet<Register, 4> &SGPROperandRegs) const {
779 // Track use registers which have already been expanded with a readfirstlane
780 // sequence. This may have multiple uses if moving a sequence.
781 DenseMap<Register, Register> WaterfalledRegMap;
782
783 MachineBasicBlock &MBB = B.getMBB();
784 MachineFunction *MF = &B.getMF();
785
786 const TargetRegisterClass *WaveRC = TRI->getWaveMaskRegClass();
787 const AMDGPU::LaneMaskConstants &LMC =
789
790#ifndef NDEBUG
791 const int OrigRangeSize = std::distance(Range.begin(), Range.end());
792#endif
793
794 MachineRegisterInfo &MRI = *B.getMRI();
795 Register SaveExecReg = MRI.createVirtualRegister(WaveRC);
796 Register InitSaveExecReg = MRI.createVirtualRegister(WaveRC);
797
798 // Don't bother using generic instructions/registers for the exec mask.
799 B.buildInstr(TargetOpcode::IMPLICIT_DEF)
800 .addDef(InitSaveExecReg);
801
802 Register PhiExec = MRI.createVirtualRegister(WaveRC);
803 Register NewExec = MRI.createVirtualRegister(WaveRC);
804
805 // To insert the loop we need to split the block. Move everything before this
806 // point to a new block, and insert a new empty block before this instruction.
809 MachineBasicBlock *RemainderBB = MF->CreateMachineBasicBlock();
810 MachineBasicBlock *RestoreExecBB = MF->CreateMachineBasicBlock();
812 ++MBBI;
813 MF->insert(MBBI, LoopBB);
814 MF->insert(MBBI, BodyBB);
815 MF->insert(MBBI, RestoreExecBB);
816 MF->insert(MBBI, RemainderBB);
817
818 LoopBB->addSuccessor(BodyBB);
819 BodyBB->addSuccessor(RestoreExecBB);
820 BodyBB->addSuccessor(LoopBB);
821
822 // Move the rest of the block into a new block.
824 RemainderBB->splice(RemainderBB->begin(), &MBB, Range.end(), MBB.end());
825
826 MBB.addSuccessor(LoopBB);
827 RestoreExecBB->addSuccessor(RemainderBB);
828
829 B.setInsertPt(*LoopBB, LoopBB->end());
830
831 B.buildInstr(TargetOpcode::PHI)
832 .addDef(PhiExec)
833 .addReg(InitSaveExecReg)
834 .addMBB(&MBB)
835 .addReg(NewExec)
836 .addMBB(BodyBB);
837
838 const DebugLoc &DL = B.getDL();
839
840 MachineInstr &FirstInst = *Range.begin();
841
842 // Move the instruction into the loop body. Note we moved everything after
843 // Range.end() already into a new block, so Range.end() is no longer valid.
844 BodyBB->splice(BodyBB->end(), &MBB, Range.begin(), MBB.end());
845
846 // Figure out the iterator range after splicing the instructions.
847 MachineBasicBlock::iterator NewBegin = FirstInst.getIterator();
848 auto NewEnd = BodyBB->end();
849
850 B.setMBB(*LoopBB);
851
852 LLT S1 = LLT::scalar(1);
853 Register CondReg;
854
855 assert(std::distance(NewBegin, NewEnd) == OrigRangeSize);
856
857 for (MachineInstr &MI : make_range(NewBegin, NewEnd)) {
858 for (MachineOperand &Op : MI.all_uses()) {
859 Register OldReg = Op.getReg();
860 if (!SGPROperandRegs.count(OldReg))
861 continue;
862
863 // See if we already processed this register in another instruction in the
864 // sequence.
865 auto OldVal = WaterfalledRegMap.find(OldReg);
866 if (OldVal != WaterfalledRegMap.end()) {
867 Op.setReg(OldVal->second);
868 continue;
869 }
870
871 Register OpReg = Op.getReg();
872 LLT OpTy = MRI.getType(OpReg);
873
874 const RegisterBank *OpBank = getRegBank(OpReg, MRI, *TRI);
875 if (OpBank != &AMDGPU::VGPRRegBank) {
876 // Insert copy from AGPR to VGPR before the loop.
877 B.setMBB(MBB);
878 OpReg = B.buildCopy(OpTy, OpReg).getReg(0);
879 MRI.setRegBank(OpReg, AMDGPU::VGPRRegBank);
880 B.setMBB(*LoopBB);
881 }
882
883 Register CurrentLaneReg = buildReadFirstLane(B, MRI, OpReg);
884
885 // Build the comparison(s).
886 unsigned OpSize = OpTy.getSizeInBits();
887 bool Is64 = OpSize % 64 == 0;
888 unsigned PartSize = Is64 ? 64 : 32;
889 LLT PartTy = LLT::scalar(PartSize);
890 unsigned NumParts = OpSize / PartSize;
892 SmallVector<Register, 8> CurrentLaneParts;
893
894 if (NumParts == 1) {
895 OpParts.push_back(OpReg);
896 CurrentLaneParts.push_back(CurrentLaneReg);
897 } else {
898 auto UnmergeOp = B.buildUnmerge(PartTy, OpReg);
899 auto UnmergeCurrentLane = B.buildUnmerge(PartTy, CurrentLaneReg);
900 for (unsigned i = 0; i < NumParts; ++i) {
901 OpParts.push_back(UnmergeOp.getReg(i));
902 CurrentLaneParts.push_back(UnmergeCurrentLane.getReg(i));
903 MRI.setRegBank(OpParts[i], AMDGPU::VGPRRegBank);
904 MRI.setRegBank(CurrentLaneParts[i], AMDGPU::SGPRRegBank);
905 }
906 }
907
908 for (unsigned i = 0; i < NumParts; ++i) {
909 auto CmpReg = B.buildICmp(CmpInst::ICMP_EQ, S1, CurrentLaneParts[i],
910 OpParts[i]).getReg(0);
911 MRI.setRegBank(CmpReg, AMDGPU::VCCRegBank);
912
913 if (!CondReg) {
914 CondReg = CmpReg;
915 } else {
916 CondReg = B.buildAnd(S1, CondReg, CmpReg).getReg(0);
917 MRI.setRegBank(CondReg, AMDGPU::VCCRegBank);
918 }
919 }
920
921 Op.setReg(CurrentLaneReg);
922
923 // Make sure we don't re-process this register again.
924 WaterfalledRegMap.insert(std::pair(OldReg, Op.getReg()));
925 }
926 }
927
928 // The ballot becomes a no-op during instruction selection.
929 CondReg = B.buildIntrinsic(Intrinsic::amdgcn_ballot,
930 {LLT::scalar(Subtarget.isWave32() ? 32 : 64)})
931 .addReg(CondReg)
932 .getReg(0);
933 MRI.setRegClass(CondReg, WaveRC);
934
935 // Update EXEC, save the original EXEC value to VCC.
936 B.buildInstr(LMC.AndSaveExecOpc)
937 .addDef(NewExec)
938 .addReg(CondReg, RegState::Kill);
939
940 MRI.setSimpleHint(NewExec, CondReg);
941
942 B.setInsertPt(*BodyBB, BodyBB->end());
943
944 // Update EXEC, switch all done bits to 0 and all todo bits to 1.
945 B.buildInstr(LMC.XorTermOpc)
946 .addDef(LMC.ExecReg)
947 .addReg(LMC.ExecReg)
948 .addReg(NewExec);
949
950 // XXX - s_xor_b64 sets scc to 1 if the result is nonzero, so can we use
951 // s_cbranch_scc0?
952
953 // Loop back to V_READFIRSTLANE_B32 if there are still variants to cover.
954 B.buildInstr(AMDGPU::SI_WATERFALL_LOOP).addMBB(LoopBB);
955
956 // Save the EXEC mask before the loop.
957 BuildMI(MBB, MBB.end(), DL, TII->get(LMC.MovOpc), SaveExecReg)
958 .addReg(LMC.ExecReg);
959
960 // Restore the EXEC mask after the loop.
961 B.setMBB(*RestoreExecBB);
962 B.buildInstr(LMC.MovTermOpc).addDef(LMC.ExecReg).addReg(SaveExecReg);
963
964 // Set the insert point after the original instruction, so any new
965 // instructions will be in the remainder.
966 B.setInsertPt(*RemainderBB, RemainderBB->begin());
967
968 return true;
969}
970
971// Return any unique registers used by \p MI at \p OpIndices that need to be
972// handled in a waterfall loop. Returns these registers in \p
973// SGPROperandRegs. Returns true if there are any operands to handle and a
974// waterfall loop is necessary.
976 SmallSet<Register, 4> &SGPROperandRegs, MachineInstr &MI,
977 MachineRegisterInfo &MRI, ArrayRef<unsigned> OpIndices) const {
978 for (unsigned Op : OpIndices) {
979 assert(MI.getOperand(Op).isUse());
980 Register Reg = MI.getOperand(Op).getReg();
981 const RegisterBank *OpBank = getRegBank(Reg, MRI, *TRI);
982 if (OpBank->getID() != AMDGPU::SGPRRegBankID)
983 SGPROperandRegs.insert(Reg);
984 }
985
986 // No operands need to be replaced, so no need to loop.
987 return !SGPROperandRegs.empty();
988}
989
992 // Use a set to avoid extra readfirstlanes in the case where multiple operands
993 // are the same register.
994 SmallSet<Register, 4> SGPROperandRegs;
995
996 if (!collectWaterfallOperands(SGPROperandRegs, MI, *B.getMRI(), OpIndices))
997 return false;
998
999 MachineBasicBlock::iterator I = MI.getIterator();
1000 return executeInWaterfallLoop(B, make_range(I, std::next(I)),
1001 SGPROperandRegs);
1002}
1003
1004// Legalize an operand that must be an SGPR by inserting a readfirstlane.
1006 MachineIRBuilder &B, MachineInstr &MI, unsigned OpIdx) const {
1007 Register Reg = MI.getOperand(OpIdx).getReg();
1008 MachineRegisterInfo &MRI = *B.getMRI();
1009 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
1010 if (Bank == &AMDGPU::SGPRRegBank)
1011 return;
1012
1013 Reg = buildReadFirstLane(B, MRI, Reg);
1014 MI.getOperand(OpIdx).setReg(Reg);
1015}
1016
1017/// Split \p Ty into 2 pieces. The first will have \p FirstSize bits, and the
1018/// rest will be in the remainder.
1019static std::pair<LLT, LLT> splitUnequalType(LLT Ty, unsigned FirstSize) {
1020 unsigned TotalSize = Ty.getSizeInBits();
1021 if (!Ty.isVector())
1022 return {LLT::scalar(FirstSize), LLT::scalar(TotalSize - FirstSize)};
1023
1024 LLT EltTy = Ty.getElementType();
1025 unsigned EltSize = EltTy.getSizeInBits();
1026 assert(FirstSize % EltSize == 0);
1027
1028 unsigned FirstPartNumElts = FirstSize / EltSize;
1029 unsigned RemainderElts = (TotalSize - FirstSize) / EltSize;
1030
1031 return {LLT::scalarOrVector(ElementCount::getFixed(FirstPartNumElts), EltTy),
1032 LLT::scalarOrVector(ElementCount::getFixed(RemainderElts), EltTy)};
1033}
1034
1036 if (!Ty.isVector())
1037 return LLT::scalar(128);
1038
1039 LLT EltTy = Ty.getElementType();
1040 assert(128 % EltTy.getSizeInBits() == 0);
1041 return LLT::fixed_vector(128 / EltTy.getSizeInBits(), EltTy);
1042}
1043
1047 MachineInstr &MI) const {
1048 MachineRegisterInfo &MRI = *B.getMRI();
1049 Register DstReg = MI.getOperand(0).getReg();
1050 const LLT LoadTy = MRI.getType(DstReg);
1051 unsigned LoadSize = LoadTy.getSizeInBits();
1052 MachineMemOperand *MMO = *MI.memoperands_begin();
1053 const unsigned MaxNonSmrdLoadSize = 128;
1054
1055 const RegisterBank *DstBank =
1056 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1057 if (DstBank == &AMDGPU::SGPRRegBank) {
1058 // There are some special cases that we need to look at for 32 bit and 96
1059 // bit SGPR loads otherwise we have nothing to do.
1060 if (LoadSize != 32 && (LoadSize != 96 || Subtarget.hasScalarDwordx3Loads()))
1061 return false;
1062
1063 const unsigned MemSize = 8 * MMO->getSize().getValue();
1064 // Scalar loads of size 8 or 16 bit with proper alignment may be widened to
1065 // 32 bit. Check to see if we need to widen the memory access, 8 or 16 bit
1066 // scalar loads should have a load size of 32 but memory access size of less
1067 // than 32.
1068 if (LoadSize == 32 &&
1069 (MemSize == 32 || LoadTy.isVector() || !isScalarLoadLegal(MI)))
1070 return false;
1071
1072 if (LoadSize == 32 &&
1073 ((MemSize == 8 && MMO->getAlign() >= Align(1)) ||
1074 (MemSize == 16 && MMO->getAlign() >= Align(2))) &&
1076 Subtarget.getGeneration() >= AMDGPUSubtarget::GFX12)
1077 return false;
1078
1079 Register PtrReg = MI.getOperand(1).getReg();
1080
1081 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
1082
1083 if (LoadSize == 32) {
1084 // This is an extending load from a sub-dword size. Widen the memory
1085 // access size to 4 bytes and clear the extra high bits appropriately
1086 const LLT S32 = LLT::scalar(32);
1087 if (MI.getOpcode() == AMDGPU::G_SEXTLOAD) {
1088 // Must extend the sign bit into higher bits for a G_SEXTLOAD
1089 auto WideLoad = B.buildLoadFromOffset(S32, PtrReg, *MMO, 0);
1090 B.buildSExtInReg(MI.getOperand(0), WideLoad, MemSize);
1091 } else if (MI.getOpcode() == AMDGPU::G_ZEXTLOAD) {
1092 // Must extend zero into higher bits with an AND for a G_ZEXTLOAD
1093 auto WideLoad = B.buildLoadFromOffset(S32, PtrReg, *MMO, 0);
1094 B.buildZExtInReg(MI.getOperand(0), WideLoad, MemSize);
1095 } else
1096 // We do not need to touch the higher bits for regular loads.
1097 B.buildLoadFromOffset(MI.getOperand(0), PtrReg, *MMO, 0);
1098 } else {
1099 // 96-bit loads are only available for vector loads. We need to split this
1100 // into a 64-bit part, and 32 (unless we can widen to a 128-bit load).
1101 if (MMO->getAlign() < Align(16)) {
1102 LegalizerHelper Helper(B.getMF(), ApplyBank, B);
1103 LLT Part64, Part32;
1104 std::tie(Part64, Part32) = splitUnequalType(LoadTy, 64);
1105 if (Helper.reduceLoadStoreWidth(cast<GAnyLoad>(MI), 0, Part64) !=
1107 return false;
1108 return true;
1109 }
1110 LLT WiderTy = widen96To128(LoadTy);
1111 auto WideLoad = B.buildLoadFromOffset(WiderTy, PtrReg, *MMO, 0);
1112 if (WiderTy.isScalar()) {
1113 B.buildTrunc(MI.getOperand(0), WideLoad);
1114 } else {
1115 B.buildDeleteTrailingVectorElements(MI.getOperand(0).getReg(),
1116 WideLoad);
1117 }
1118 }
1119
1120 MI.eraseFromParent();
1121 return true;
1122 }
1123
1124 // 128-bit loads are supported for all instruction types.
1125 if (LoadSize <= MaxNonSmrdLoadSize)
1126 return false;
1127
1128 SmallVector<Register, 1> SrcRegs(OpdMapper.getVRegs(1));
1129
1130 if (SrcRegs.empty())
1131 SrcRegs.push_back(MI.getOperand(1).getReg());
1132
1133 // RegBankSelect only emits scalar types, so we need to reset the pointer
1134 // operand to a pointer type.
1135 Register BasePtrReg = SrcRegs[0];
1136 LLT PtrTy = MRI.getType(MI.getOperand(1).getReg());
1137 MRI.setType(BasePtrReg, PtrTy);
1138
1139 // The following are the loads not splitted enough during legalization
1140 // because it was not clear they are smem-load or vmem-load
1143 assert(LoadSize % MaxNonSmrdLoadSize == 0);
1144 unsigned NumSplitParts = LoadTy.getSizeInBits() / MaxNonSmrdLoadSize;
1145 const LLT LoadSplitTy = LoadTy.divide(NumSplitParts);
1146 ApplyRegBankMapping O(B, *this, MRI, &AMDGPU::VGPRRegBank);
1147 LegalizerHelper Helper(B.getMF(), O, B);
1148 if (LoadTy.isVector()) {
1149 if (Helper.fewerElementsVector(MI, 0, LoadSplitTy) !=
1151 return false;
1152 } else {
1153 if (Helper.narrowScalar(MI, 0, LoadSplitTy) != LegalizerHelper::Legalized)
1154 return false;
1155 }
1156 }
1157
1158 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
1159 return true;
1160}
1161
1165 MachineInstr &MI) const {
1166 MachineRegisterInfo &MRI = *B.getMRI();
1167 const MachineFunction &MF = B.getMF();
1168 const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
1169 const auto &TFI = *ST.getFrameLowering();
1170
1171 // Guard in case the stack growth direction ever changes with scratch
1172 // instructions.
1173 assert(TFI.getStackGrowthDirection() == TargetFrameLowering::StackGrowsUp &&
1174 "Stack grows upwards for AMDGPU");
1175
1176 Register Dst = MI.getOperand(0).getReg();
1177 Register AllocSize = MI.getOperand(1).getReg();
1178 Align Alignment = assumeAligned(MI.getOperand(2).getImm());
1179
1180 const RegisterBank *SizeBank = getRegBank(AllocSize, MRI, *TRI);
1181
1182 if (SizeBank != &AMDGPU::SGPRRegBank) {
1183 auto WaveReduction =
1184 B.buildIntrinsic(Intrinsic::amdgcn_wave_reduce_umax, {LLT::scalar(32)})
1185 .addUse(AllocSize)
1186 .addImm(0);
1187 AllocSize = WaveReduction.getReg(0);
1188 }
1189
1190 LLT PtrTy = MRI.getType(Dst);
1191 LLT IntPtrTy = LLT::scalar(PtrTy.getSizeInBits());
1192
1194 Register SPReg = Info->getStackPtrOffsetReg();
1195 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
1196
1197 auto WaveSize = B.buildConstant(LLT::scalar(32), ST.getWavefrontSizeLog2());
1198 auto ScaledSize = B.buildShl(IntPtrTy, AllocSize, WaveSize);
1199
1200 auto OldSP = B.buildCopy(PtrTy, SPReg);
1201 if (Alignment > TFI.getStackAlign()) {
1202 auto StackAlignMask = (Alignment.value() << ST.getWavefrontSizeLog2()) - 1;
1203 auto Tmp1 = B.buildPtrAdd(PtrTy, OldSP,
1204 B.buildConstant(LLT::scalar(32), StackAlignMask));
1205 B.buildMaskLowPtrBits(Dst, Tmp1,
1206 Log2(Alignment) + ST.getWavefrontSizeLog2());
1207 } else {
1208 B.buildCopy(Dst, OldSP);
1209 }
1210 auto PtrAdd = B.buildPtrAdd(PtrTy, Dst, ScaledSize);
1211 B.buildCopy(SPReg, PtrAdd);
1212 MI.eraseFromParent();
1213 return true;
1214}
1215
1219 int RsrcIdx) const {
1220 const int NumDefs = MI.getNumExplicitDefs();
1221
1222 // The reported argument index is relative to the IR intrinsic call arguments,
1223 // so we need to shift by the number of defs and the intrinsic ID.
1224 RsrcIdx += NumDefs + 1;
1225
1226 // Insert copies to VGPR arguments.
1227 applyDefaultMapping(OpdMapper);
1228
1229 // Fixup any SGPR arguments.
1230 SmallVector<unsigned, 4> SGPRIndexes;
1231 for (int I = NumDefs, NumOps = MI.getNumOperands(); I != NumOps; ++I) {
1232 if (!MI.getOperand(I).isReg())
1233 continue;
1234
1235 // If this intrinsic has a sampler, it immediately follows rsrc.
1236 if (I == RsrcIdx || I == RsrcIdx + 1)
1237 SGPRIndexes.push_back(I);
1238 }
1239
1240 executeInWaterfallLoop(B, MI, SGPRIndexes);
1241 return true;
1242}
1243
1244// Analyze a combined offset from an llvm.amdgcn.s.buffer intrinsic and store
1245// the three offsets (voffset, soffset and instoffset)
1247 MachineIRBuilder &B, Register CombinedOffset, Register &VOffsetReg,
1248 Register &SOffsetReg, int64_t &InstOffsetVal, Align Alignment) const {
1249 const LLT S32 = LLT::scalar(32);
1250 MachineRegisterInfo *MRI = B.getMRI();
1251
1252 if (std::optional<int64_t> Imm =
1253 getIConstantVRegSExtVal(CombinedOffset, *MRI)) {
1254 uint32_t SOffset, ImmOffset;
1255 if (TII->splitMUBUFOffset(*Imm, SOffset, ImmOffset, Alignment)) {
1256 VOffsetReg = B.buildConstant(S32, 0).getReg(0);
1257 SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);
1258 InstOffsetVal = ImmOffset;
1259
1260 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1261 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1262 return SOffset + ImmOffset;
1263 }
1264 }
1265
1266 Register Base;
1267 unsigned Offset;
1268
1269 std::tie(Base, Offset) =
1270 AMDGPU::getBaseWithConstantOffset(*MRI, CombinedOffset);
1271
1272 uint32_t SOffset, ImmOffset;
1273 if ((int)Offset > 0 &&
1274 TII->splitMUBUFOffset(Offset, SOffset, ImmOffset, Alignment)) {
1275 if (getRegBank(Base, *MRI, *TRI) == &AMDGPU::VGPRRegBank) {
1276 VOffsetReg = Base;
1277 SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);
1278 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1279 InstOffsetVal = ImmOffset;
1280 return 0; // XXX - Why is this 0?
1281 }
1282
1283 // If we have SGPR base, we can use it for soffset.
1284 if (SOffset == 0) {
1285 VOffsetReg = B.buildConstant(S32, 0).getReg(0);
1286 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1287 SOffsetReg = Base;
1288 InstOffsetVal = ImmOffset;
1289 return 0; // XXX - Why is this 0?
1290 }
1291 }
1292
1293 // Handle the variable sgpr + vgpr case.
1294 MachineInstr *Add = getOpcodeDef(AMDGPU::G_ADD, CombinedOffset, *MRI);
1295 if (Add && (int)Offset >= 0) {
1296 Register Src0 = getSrcRegIgnoringCopies(Add->getOperand(1).getReg(), *MRI);
1297 Register Src1 = getSrcRegIgnoringCopies(Add->getOperand(2).getReg(), *MRI);
1298
1299 const RegisterBank *Src0Bank = getRegBank(Src0, *MRI, *TRI);
1300 const RegisterBank *Src1Bank = getRegBank(Src1, *MRI, *TRI);
1301
1302 if (Src0Bank == &AMDGPU::VGPRRegBank && Src1Bank == &AMDGPU::SGPRRegBank) {
1303 VOffsetReg = Src0;
1304 SOffsetReg = Src1;
1305 return 0;
1306 }
1307
1308 if (Src0Bank == &AMDGPU::SGPRRegBank && Src1Bank == &AMDGPU::VGPRRegBank) {
1309 VOffsetReg = Src1;
1310 SOffsetReg = Src0;
1311 return 0;
1312 }
1313 }
1314
1315 // Ensure we have a VGPR for the combined offset. This could be an issue if we
1316 // have an SGPR offset and a VGPR resource.
1317 if (getRegBank(CombinedOffset, *MRI, *TRI) == &AMDGPU::VGPRRegBank) {
1318 VOffsetReg = CombinedOffset;
1319 } else {
1320 VOffsetReg = B.buildCopy(S32, CombinedOffset).getReg(0);
1321 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1322 }
1323
1324 SOffsetReg = B.buildConstant(S32, 0).getReg(0);
1325 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1326 return 0;
1327}
1328
1330 switch (Opc) {
1331 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
1332 return AMDGPU::G_AMDGPU_BUFFER_LOAD;
1333 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
1334 return AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE;
1335 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
1336 return AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE;
1337 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
1338 return AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT;
1339 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT:
1340 return AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT;
1341 default:
1342 break;
1343 }
1344 llvm_unreachable("Unexpected s_buffer_load opcode");
1345}
1346
1348 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
1349 MachineInstr &MI = OpdMapper.getMI();
1350 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1351
1352 const LLT S32 = LLT::scalar(32);
1353 Register Dst = MI.getOperand(0).getReg();
1354 LLT Ty = MRI.getType(Dst);
1355
1356 const RegisterBank *RSrcBank =
1357 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
1358 const RegisterBank *OffsetBank =
1359 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
1360 if (RSrcBank == &AMDGPU::SGPRRegBank &&
1361 OffsetBank == &AMDGPU::SGPRRegBank)
1362 return true; // Legal mapping
1363
1364 // FIXME: 96-bit case was widened during legalize. We need to narrow it back
1365 // here but don't have an MMO.
1366
1367 unsigned LoadSize = Ty.getSizeInBits();
1368 int NumLoads = 1;
1369 if (LoadSize == 256 || LoadSize == 512) {
1370 NumLoads = LoadSize / 128;
1371 Ty = Ty.divide(NumLoads);
1372 }
1373
1374 // Use the alignment to ensure that the required offsets will fit into the
1375 // immediate offsets.
1376 const Align Alignment = NumLoads > 1 ? Align(16 * NumLoads) : Align(1);
1377
1378 MachineFunction &MF = B.getMF();
1379
1380 Register SOffset;
1381 Register VOffset;
1382 int64_t ImmOffset = 0;
1383
1384 unsigned MMOOffset = setBufferOffsets(B, MI.getOperand(2).getReg(), VOffset,
1385 SOffset, ImmOffset, Alignment);
1386
1387 // TODO: 96-bit loads were widened to 128-bit results. Shrink the result if we
1388 // can, but we need to track an MMO for that.
1389 const unsigned MemSize = (Ty.getSizeInBits() + 7) / 8;
1390 const Align MemAlign(4); // FIXME: ABI type alignment?
1395 MemSize, MemAlign);
1396 if (MMOOffset != 0)
1397 BaseMMO = MF.getMachineMemOperand(BaseMMO, MMOOffset, MemSize);
1398
1399 // If only the offset is divergent, emit a MUBUF buffer load instead. We can
1400 // assume that the buffer is unswizzled.
1401
1402 Register RSrc = MI.getOperand(1).getReg();
1403 Register VIndex = B.buildConstant(S32, 0).getReg(0);
1404 B.getMRI()->setRegBank(VIndex, AMDGPU::VGPRRegBank);
1405
1406 SmallVector<Register, 4> LoadParts(NumLoads);
1407
1408 MachineBasicBlock::iterator MII = MI.getIterator();
1409 MachineInstrSpan Span(MII, &B.getMBB());
1410
1411 for (int i = 0; i < NumLoads; ++i) {
1412 if (NumLoads == 1) {
1413 LoadParts[i] = Dst;
1414 } else {
1415 LoadParts[i] = MRI.createGenericVirtualRegister(Ty);
1416 MRI.setRegBank(LoadParts[i], AMDGPU::VGPRRegBank);
1417 }
1418
1419 MachineMemOperand *MMO = BaseMMO;
1420 if (i != 0)
1421 BaseMMO = MF.getMachineMemOperand(BaseMMO, MMOOffset + 16 * i, MemSize);
1422
1423 B.buildInstr(getSBufferLoadCorrespondingBufferLoadOpcode(MI.getOpcode()))
1424 .addDef(LoadParts[i]) // vdata
1425 .addUse(RSrc) // rsrc
1426 .addUse(VIndex) // vindex
1427 .addUse(VOffset) // voffset
1428 .addUse(SOffset) // soffset
1429 .addImm(ImmOffset + 16 * i) // offset(imm)
1430 .addImm(0) // cachepolicy, swizzled buffer(imm)
1431 .addImm(0) // idxen(imm)
1432 .addMemOperand(MMO);
1433 }
1434
1435 // TODO: If only the resource is a VGPR, it may be better to execute the
1436 // scalar load in the waterfall loop if the resource is expected to frequently
1437 // be dynamically uniform.
1438 if (RSrcBank != &AMDGPU::SGPRRegBank) {
1439 // Remove the original instruction to avoid potentially confusing the
1440 // waterfall loop logic.
1441 B.setInstr(*Span.begin());
1442 MI.eraseFromParent();
1443
1444 SmallSet<Register, 4> OpsToWaterfall;
1445
1446 OpsToWaterfall.insert(RSrc);
1447 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
1448 OpsToWaterfall);
1449 }
1450
1451 if (NumLoads != 1) {
1452 if (Ty.isVector())
1453 B.buildConcatVectors(Dst, LoadParts);
1454 else
1455 B.buildMergeLikeInstr(Dst, LoadParts);
1456 }
1457
1458 // We removed the instruction earlier with a waterfall loop.
1459 if (RSrcBank == &AMDGPU::SGPRRegBank)
1460 MI.eraseFromParent();
1461
1462 return true;
1463}
1464
1466 const OperandsMapper &OpdMapper,
1467 bool Signed) const {
1468 MachineInstr &MI = OpdMapper.getMI();
1469 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1470
1471 // Insert basic copies
1472 applyDefaultMapping(OpdMapper);
1473
1474 Register DstReg = MI.getOperand(0).getReg();
1475 LLT Ty = MRI.getType(DstReg);
1476
1477 const LLT S32 = LLT::scalar(32);
1478
1479 unsigned FirstOpnd = isa<GIntrinsic>(MI) ? 2 : 1;
1480 Register SrcReg = MI.getOperand(FirstOpnd).getReg();
1481 Register OffsetReg = MI.getOperand(FirstOpnd + 1).getReg();
1482 Register WidthReg = MI.getOperand(FirstOpnd + 2).getReg();
1483
1484 const RegisterBank *DstBank =
1485 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1486 if (DstBank == &AMDGPU::VGPRRegBank) {
1487 if (Ty == S32)
1488 return true;
1489
1490 // There is no 64-bit vgpr bitfield extract instructions so the operation
1491 // is expanded to a sequence of instructions that implement the operation.
1492 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
1493
1494 const LLT S64 = LLT::scalar(64);
1495 // Shift the source operand so that extracted bits start at bit 0.
1496 auto ShiftOffset = Signed ? B.buildAShr(S64, SrcReg, OffsetReg)
1497 : B.buildLShr(S64, SrcReg, OffsetReg);
1498 auto UnmergeSOffset = B.buildUnmerge({S32, S32}, ShiftOffset);
1499
1500 // A 64-bit bitfield extract uses the 32-bit bitfield extract instructions
1501 // if the width is a constant.
1502 if (auto ConstWidth = getIConstantVRegValWithLookThrough(WidthReg, MRI)) {
1503 // Use the 32-bit bitfield extract instruction if the width is a constant.
1504 // Depending on the width size, use either the low or high 32-bits.
1505 auto Zero = B.buildConstant(S32, 0);
1506 auto WidthImm = ConstWidth->Value.getZExtValue();
1507 if (WidthImm <= 32) {
1508 // Use bitfield extract on the lower 32-bit source, and then sign-extend
1509 // or clear the upper 32-bits.
1510 auto Extract =
1511 Signed ? B.buildSbfx(S32, UnmergeSOffset.getReg(0), Zero, WidthReg)
1512 : B.buildUbfx(S32, UnmergeSOffset.getReg(0), Zero, WidthReg);
1513 auto Extend =
1514 Signed ? B.buildAShr(S32, Extract, B.buildConstant(S32, 31)) : Zero;
1515 B.buildMergeLikeInstr(DstReg, {Extract, Extend});
1516 } else {
1517 // Use bitfield extract on upper 32-bit source, and combine with lower
1518 // 32-bit source.
1519 auto UpperWidth = B.buildConstant(S32, WidthImm - 32);
1520 auto Extract =
1521 Signed
1522 ? B.buildSbfx(S32, UnmergeSOffset.getReg(1), Zero, UpperWidth)
1523 : B.buildUbfx(S32, UnmergeSOffset.getReg(1), Zero, UpperWidth);
1524 B.buildMergeLikeInstr(DstReg, {UnmergeSOffset.getReg(0), Extract});
1525 }
1526 MI.eraseFromParent();
1527 return true;
1528 }
1529
1530 // Expand to Src >> Offset << (64 - Width) >> (64 - Width) using 64-bit
1531 // operations.
1532 auto ExtShift = B.buildSub(S32, B.buildConstant(S32, 64), WidthReg);
1533 auto SignBit = B.buildShl(S64, ShiftOffset, ExtShift);
1534 if (Signed)
1535 B.buildAShr(S64, SignBit, ExtShift);
1536 else
1537 B.buildLShr(S64, SignBit, ExtShift);
1538 MI.eraseFromParent();
1539 return true;
1540 }
1541
1542 // The scalar form packs the offset and width in a single operand.
1543
1544 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
1545
1546 // Ensure the high bits are clear to insert the offset.
1547 auto OffsetMask = B.buildConstant(S32, maskTrailingOnes<unsigned>(6));
1548 auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
1549
1550 // Zeros out the low bits, so don't bother clamping the input value.
1551 auto ShiftWidth = B.buildShl(S32, WidthReg, B.buildConstant(S32, 16));
1552
1553 // Transformation function, pack the offset and width of a BFE into
1554 // the format expected by the S_BFE_I32 / S_BFE_U32. In the second
1555 // source, bits [5:0] contain the offset and bits [22:16] the width.
1556 auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth);
1557
1558 // TODO: It might be worth using a pseudo here to avoid scc clobber and
1559 // register class constraints.
1560 unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
1561 (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
1562
1563 auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
1564 if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
1565 llvm_unreachable("failed to constrain BFE");
1566
1567 MI.eraseFromParent();
1568 return true;
1569}
1570
1572 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
1573 MachineInstr &MI = OpdMapper.getMI();
1574 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1575
1576 // Insert basic copies.
1577 applyDefaultMapping(OpdMapper);
1578
1579 Register Dst0 = MI.getOperand(0).getReg();
1580 Register Dst1 = MI.getOperand(1).getReg();
1581 Register Src0 = MI.getOperand(2).getReg();
1582 Register Src1 = MI.getOperand(3).getReg();
1583 Register Src2 = MI.getOperand(4).getReg();
1584
1585 if (MRI.getRegBankOrNull(Src0) == &AMDGPU::VGPRRegBank)
1586 return true;
1587
1588 bool IsUnsigned = MI.getOpcode() == AMDGPU::G_AMDGPU_MAD_U64_U32;
1589 LLT S1 = LLT::scalar(1);
1590 LLT S32 = LLT::scalar(32);
1591
1592 bool DstOnValu = MRI.getRegBankOrNull(Src2) == &AMDGPU::VGPRRegBank;
1593 bool Accumulate = true;
1594
1595 if (!DstOnValu) {
1596 if (mi_match(Src2, MRI, m_ZeroInt()))
1597 Accumulate = false;
1598 }
1599
1600 // Keep the multiplication on the SALU.
1601 Register DstHi;
1602 Register DstLo = B.buildMul(S32, Src0, Src1).getReg(0);
1603 bool MulHiInVgpr = false;
1604
1605 MRI.setRegBank(DstLo, AMDGPU::SGPRRegBank);
1606
1607 if (Subtarget.hasSMulHi()) {
1608 DstHi = IsUnsigned ? B.buildUMulH(S32, Src0, Src1).getReg(0)
1609 : B.buildSMulH(S32, Src0, Src1).getReg(0);
1610 MRI.setRegBank(DstHi, AMDGPU::SGPRRegBank);
1611 } else {
1612 Register VSrc0 = B.buildCopy(S32, Src0).getReg(0);
1613 Register VSrc1 = B.buildCopy(S32, Src1).getReg(0);
1614
1615 MRI.setRegBank(VSrc0, AMDGPU::VGPRRegBank);
1616 MRI.setRegBank(VSrc1, AMDGPU::VGPRRegBank);
1617
1618 DstHi = IsUnsigned ? B.buildUMulH(S32, VSrc0, VSrc1).getReg(0)
1619 : B.buildSMulH(S32, VSrc0, VSrc1).getReg(0);
1620 MRI.setRegBank(DstHi, AMDGPU::VGPRRegBank);
1621
1622 if (!DstOnValu) {
1623 DstHi = buildReadFirstLane(B, MRI, DstHi);
1624 } else {
1625 MulHiInVgpr = true;
1626 }
1627 }
1628
1629 // Accumulate and produce the "carry-out" bit.
1630 //
1631 // The "carry-out" is defined as bit 64 of the result when computed as a
1632 // big integer. For unsigned multiply-add, this matches the usual definition
1633 // of carry-out. For signed multiply-add, bit 64 is the sign bit of the
1634 // result, which is determined as:
1635 // sign(Src0 * Src1) + sign(Src2) + carry-out from unsigned 64-bit add
1636 LLT CarryType = DstOnValu ? S1 : S32;
1637 const RegisterBank &CarryBank =
1638 DstOnValu ? AMDGPU::VCCRegBank : AMDGPU::SGPRRegBank;
1639 const RegisterBank &DstBank =
1640 DstOnValu ? AMDGPU::VGPRRegBank : AMDGPU::SGPRRegBank;
1641 Register Carry;
1642 Register Zero;
1643
1644 if (!IsUnsigned) {
1645 Zero = B.buildConstant(S32, 0).getReg(0);
1646 MRI.setRegBank(Zero,
1647 MulHiInVgpr ? AMDGPU::VGPRRegBank : AMDGPU::SGPRRegBank);
1648
1649 Carry = B.buildICmp(CmpInst::ICMP_SLT, MulHiInVgpr ? S1 : S32, DstHi, Zero)
1650 .getReg(0);
1651 MRI.setRegBank(Carry, MulHiInVgpr ? AMDGPU::VCCRegBank
1652 : AMDGPU::SGPRRegBank);
1653
1654 if (DstOnValu && !MulHiInVgpr) {
1655 Carry = B.buildTrunc(S1, Carry).getReg(0);
1656 MRI.setRegBank(Carry, AMDGPU::VCCRegBank);
1657 }
1658 }
1659
1660 if (Accumulate) {
1661 if (DstOnValu) {
1662 DstLo = B.buildCopy(S32, DstLo).getReg(0);
1663 DstHi = B.buildCopy(S32, DstHi).getReg(0);
1664 MRI.setRegBank(DstLo, AMDGPU::VGPRRegBank);
1665 MRI.setRegBank(DstHi, AMDGPU::VGPRRegBank);
1666 }
1667
1668 auto Unmerge = B.buildUnmerge(S32, Src2);
1669 Register Src2Lo = Unmerge.getReg(0);
1670 Register Src2Hi = Unmerge.getReg(1);
1671 MRI.setRegBank(Src2Lo, DstBank);
1672 MRI.setRegBank(Src2Hi, DstBank);
1673
1674 if (!IsUnsigned) {
1675 auto Src2Sign = B.buildICmp(CmpInst::ICMP_SLT, CarryType, Src2Hi, Zero);
1676 MRI.setRegBank(Src2Sign.getReg(0), CarryBank);
1677
1678 Carry = B.buildXor(CarryType, Carry, Src2Sign).getReg(0);
1679 MRI.setRegBank(Carry, CarryBank);
1680 }
1681
1682 auto AddLo = B.buildUAddo(S32, CarryType, DstLo, Src2Lo);
1683 DstLo = AddLo.getReg(0);
1684 Register CarryLo = AddLo.getReg(1);
1685 MRI.setRegBank(DstLo, DstBank);
1686 MRI.setRegBank(CarryLo, CarryBank);
1687
1688 auto AddHi = B.buildUAdde(S32, CarryType, DstHi, Src2Hi, CarryLo);
1689 DstHi = AddHi.getReg(0);
1690 MRI.setRegBank(DstHi, DstBank);
1691
1692 Register CarryHi = AddHi.getReg(1);
1693 MRI.setRegBank(CarryHi, CarryBank);
1694
1695 if (IsUnsigned) {
1696 Carry = CarryHi;
1697 } else {
1698 Carry = B.buildXor(CarryType, Carry, CarryHi).getReg(0);
1699 MRI.setRegBank(Carry, CarryBank);
1700 }
1701 } else {
1702 if (IsUnsigned) {
1703 Carry = B.buildConstant(CarryType, 0).getReg(0);
1704 MRI.setRegBank(Carry, CarryBank);
1705 }
1706 }
1707
1708 B.buildMergeLikeInstr(Dst0, {DstLo, DstHi});
1709
1710 if (DstOnValu) {
1711 B.buildCopy(Dst1, Carry);
1712 } else {
1713 B.buildTrunc(Dst1, Carry);
1714 }
1715
1716 MI.eraseFromParent();
1717 return true;
1718}
1719
1720// Return a suitable opcode for extending the operands of Opc when widening.
1721static unsigned getExtendOp(unsigned Opc) {
1722 switch (Opc) {
1723 case TargetOpcode::G_ASHR:
1724 case TargetOpcode::G_SMIN:
1725 case TargetOpcode::G_SMAX:
1726 return TargetOpcode::G_SEXT;
1727 case TargetOpcode::G_LSHR:
1728 case TargetOpcode::G_UMIN:
1729 case TargetOpcode::G_UMAX:
1730 return TargetOpcode::G_ZEXT;
1731 default:
1732 return TargetOpcode::G_ANYEXT;
1733 }
1734}
1735
1736// Emit a legalized extension from <2 x s16> to 2 32-bit components, avoiding
1737// any illegal vector extend or unmerge operations.
1738static std::pair<Register, Register>
1739unpackV2S16ToS32(MachineIRBuilder &B, Register Src, unsigned ExtOpcode) {
1740 const LLT S32 = LLT::scalar(32);
1741 auto Bitcast = B.buildBitcast(S32, Src);
1742
1743 if (ExtOpcode == TargetOpcode::G_SEXT) {
1744 auto ExtLo = B.buildSExtInReg(S32, Bitcast, 16);
1745 auto ShiftHi = B.buildAShr(S32, Bitcast, B.buildConstant(S32, 16));
1746 return std::pair(ExtLo.getReg(0), ShiftHi.getReg(0));
1747 }
1748
1749 auto ShiftHi = B.buildLShr(S32, Bitcast, B.buildConstant(S32, 16));
1750 if (ExtOpcode == TargetOpcode::G_ZEXT) {
1751 auto ExtLo = B.buildAnd(S32, Bitcast, B.buildConstant(S32, 0xffff));
1752 return std::pair(ExtLo.getReg(0), ShiftHi.getReg(0));
1753 }
1754
1755 assert(ExtOpcode == TargetOpcode::G_ANYEXT);
1756 return std::pair(Bitcast.getReg(0), ShiftHi.getReg(0));
1757}
1758
1759// For cases where only a single copy is inserted for matching register banks.
1760// Replace the register in the instruction operand
1762 const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper, unsigned OpIdx) {
1763 SmallVector<unsigned, 1> SrcReg(OpdMapper.getVRegs(OpIdx));
1764 if (!SrcReg.empty()) {
1765 assert(SrcReg.size() == 1);
1766 OpdMapper.getMI().getOperand(OpIdx).setReg(SrcReg[0]);
1767 return true;
1768 }
1769
1770 return false;
1771}
1772
1773/// Handle register layout difference for f16 images for some subtargets.
1776 Register Reg) const {
1777 if (!Subtarget.hasUnpackedD16VMem())
1778 return Reg;
1779
1780 const LLT S16 = LLT::scalar(16);
1781 LLT StoreVT = MRI.getType(Reg);
1782 if (!StoreVT.isVector() || StoreVT.getElementType() != S16)
1783 return Reg;
1784
1785 auto Unmerge = B.buildUnmerge(S16, Reg);
1786
1787
1788 SmallVector<Register, 4> WideRegs;
1789 for (int I = 0, E = Unmerge->getNumOperands() - 1; I != E; ++I)
1790 WideRegs.push_back(Unmerge.getReg(I));
1791
1792 const LLT S32 = LLT::scalar(32);
1793 int NumElts = StoreVT.getNumElements();
1794
1795 return B.buildMergeLikeInstr(LLT::fixed_vector(NumElts, S32), WideRegs)
1796 .getReg(0);
1797}
1798
1799static std::pair<Register, unsigned>
1801 int64_t Const;
1802 if (mi_match(Reg, MRI, m_ICst(Const)))
1803 return std::pair(Register(), Const);
1804
1805 Register Base;
1806 if (mi_match(Reg, MRI, m_GAdd(m_Reg(Base), m_ICst(Const))))
1807 return std::pair(Base, Const);
1808
1809 // TODO: Handle G_OR used for add case
1810 return std::pair(Reg, 0);
1811}
1812
1813std::pair<Register, unsigned>
1815 Register OrigOffset) const {
1816 const unsigned MaxImm = SIInstrInfo::getMaxMUBUFImmOffset(Subtarget);
1817 Register BaseReg;
1818 unsigned ImmOffset;
1819 const LLT S32 = LLT::scalar(32);
1820
1821 // TODO: Use AMDGPU::getBaseWithConstantOffset() instead.
1822 std::tie(BaseReg, ImmOffset) = getBaseWithConstantOffset(*B.getMRI(),
1823 OrigOffset);
1824
1825 unsigned C1 = 0;
1826 if (ImmOffset != 0) {
1827 // If the immediate value is too big for the immoffset field, put only bits
1828 // that would normally fit in the immoffset field. The remaining value that
1829 // is copied/added for the voffset field is a large power of 2, and it
1830 // stands more chance of being CSEd with the copy/add for another similar
1831 // load/store.
1832 // However, do not do that rounding down if that is a negative
1833 // number, as it appears to be illegal to have a negative offset in the
1834 // vgpr, even if adding the immediate offset makes it positive.
1835 unsigned Overflow = ImmOffset & ~MaxImm;
1836 ImmOffset -= Overflow;
1837 if ((int32_t)Overflow < 0) {
1838 Overflow += ImmOffset;
1839 ImmOffset = 0;
1840 }
1841
1842 C1 = ImmOffset;
1843 if (Overflow != 0) {
1844 if (!BaseReg)
1845 BaseReg = B.buildConstant(S32, Overflow).getReg(0);
1846 else {
1847 auto OverflowVal = B.buildConstant(S32, Overflow);
1848 BaseReg = B.buildAdd(S32, BaseReg, OverflowVal).getReg(0);
1849 }
1850 }
1851 }
1852
1853 if (!BaseReg)
1854 BaseReg = B.buildConstant(S32, 0).getReg(0);
1855
1856 return {BaseReg, C1};
1857}
1858
1860 Register SrcReg) const {
1861 MachineRegisterInfo &MRI = *B.getMRI();
1862 LLT SrcTy = MRI.getType(SrcReg);
1863 if (SrcTy.getSizeInBits() == 32) {
1864 // Use a v_mov_b32 here to make the exec dependency explicit.
1865 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1866 .addDef(DstReg)
1867 .addUse(SrcReg);
1868 return constrainGenericRegister(DstReg, AMDGPU::VGPR_32RegClass, MRI) &&
1869 constrainGenericRegister(SrcReg, AMDGPU::SReg_32RegClass, MRI);
1870 }
1871
1872 Register TmpReg0 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
1873 Register TmpReg1 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
1874
1875 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1876 .addDef(TmpReg0)
1877 .addUse(SrcReg, 0, AMDGPU::sub0);
1878 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1879 .addDef(TmpReg1)
1880 .addUse(SrcReg, 0, AMDGPU::sub1);
1881 B.buildInstr(AMDGPU::REG_SEQUENCE)
1882 .addDef(DstReg)
1883 .addUse(TmpReg0)
1884 .addImm(AMDGPU::sub0)
1885 .addUse(TmpReg1)
1886 .addImm(AMDGPU::sub1);
1887
1888 return constrainGenericRegister(SrcReg, AMDGPU::SReg_64RegClass, MRI) &&
1889 constrainGenericRegister(DstReg, AMDGPU::VReg_64RegClass, MRI);
1890}
1891
1892/// Utility function for pushing dynamic vector indexes with a constant offset
1893/// into waterfall loops.
1895 MachineInstr &IdxUseInstr,
1896 unsigned OpIdx,
1897 unsigned ConstOffset) {
1898 MachineRegisterInfo &MRI = *B.getMRI();
1899 const LLT S32 = LLT::scalar(32);
1900 Register WaterfallIdx = IdxUseInstr.getOperand(OpIdx).getReg();
1901 B.setInsertPt(*IdxUseInstr.getParent(), IdxUseInstr.getIterator());
1902
1903 auto MaterializedOffset = B.buildConstant(S32, ConstOffset);
1904
1905 auto Add = B.buildAdd(S32, WaterfallIdx, MaterializedOffset);
1906 MRI.setRegBank(MaterializedOffset.getReg(0), AMDGPU::SGPRRegBank);
1907 MRI.setRegBank(Add.getReg(0), AMDGPU::SGPRRegBank);
1908 IdxUseInstr.getOperand(OpIdx).setReg(Add.getReg(0));
1909}
1910
1911/// Implement extending a 32-bit value to a 64-bit value. \p Lo32Reg is the
1912/// original 32-bit source value (to be inserted in the low part of the combined
1913/// 64-bit result), and \p Hi32Reg is the high half of the combined 64-bit
1914/// value.
1916 Register Hi32Reg, Register Lo32Reg,
1917 unsigned ExtOpc,
1918 const RegisterBank &RegBank,
1919 bool IsBooleanSrc = false) {
1920 if (ExtOpc == AMDGPU::G_ZEXT) {
1921 B.buildConstant(Hi32Reg, 0);
1922 } else if (ExtOpc == AMDGPU::G_SEXT) {
1923 if (IsBooleanSrc) {
1924 // If we know the original source was an s1, the high half is the same as
1925 // the low.
1926 B.buildCopy(Hi32Reg, Lo32Reg);
1927 } else {
1928 // Replicate sign bit from 32-bit extended part.
1929 auto ShiftAmt = B.buildConstant(LLT::scalar(32), 31);
1930 B.getMRI()->setRegBank(ShiftAmt.getReg(0), RegBank);
1931 B.buildAShr(Hi32Reg, Lo32Reg, ShiftAmt);
1932 }
1933 } else {
1934 assert(ExtOpc == AMDGPU::G_ANYEXT && "not an integer extension");
1935 B.buildUndef(Hi32Reg);
1936 }
1937}
1938
1939bool AMDGPURegisterBankInfo::foldExtractEltToCmpSelect(
1941 const OperandsMapper &OpdMapper) const {
1942 MachineRegisterInfo &MRI = *B.getMRI();
1943
1944 Register VecReg = MI.getOperand(1).getReg();
1945 Register Idx = MI.getOperand(2).getReg();
1946
1947 const RegisterBank &IdxBank =
1948 *OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
1949
1950 bool IsDivergentIdx = IdxBank != AMDGPU::SGPRRegBank;
1951
1952 LLT VecTy = MRI.getType(VecReg);
1953 unsigned EltSize = VecTy.getScalarSizeInBits();
1954 unsigned NumElem = VecTy.getNumElements();
1955
1956 if (!SITargetLowering::shouldExpandVectorDynExt(EltSize, NumElem,
1957 IsDivergentIdx, &Subtarget))
1958 return false;
1959
1960 LLT S32 = LLT::scalar(32);
1961
1962 const RegisterBank &DstBank =
1963 *OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1964 const RegisterBank &SrcBank =
1965 *OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
1966
1967 const RegisterBank &CCBank =
1968 (DstBank == AMDGPU::SGPRRegBank &&
1969 SrcBank == AMDGPU::SGPRRegBank &&
1970 IdxBank == AMDGPU::SGPRRegBank) ? AMDGPU::SGPRRegBank
1971 : AMDGPU::VCCRegBank;
1972 LLT CCTy = (CCBank == AMDGPU::SGPRRegBank) ? S32 : LLT::scalar(1);
1973
1974 if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank) {
1975 Idx = B.buildCopy(S32, Idx)->getOperand(0).getReg();
1976 MRI.setRegBank(Idx, AMDGPU::VGPRRegBank);
1977 }
1978
1979 LLT EltTy = VecTy.getScalarType();
1980 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
1981 unsigned NumLanes = DstRegs.size();
1982 if (!NumLanes)
1983 NumLanes = 1;
1984 else
1985 EltTy = MRI.getType(DstRegs[0]);
1986
1987 auto UnmergeToEltTy = B.buildUnmerge(EltTy, VecReg);
1988 SmallVector<Register, 2> Res(NumLanes);
1989 for (unsigned L = 0; L < NumLanes; ++L)
1990 Res[L] = UnmergeToEltTy.getReg(L);
1991
1992 for (unsigned I = 1; I < NumElem; ++I) {
1993 auto IC = B.buildConstant(S32, I);
1994 MRI.setRegBank(IC->getOperand(0).getReg(), AMDGPU::SGPRRegBank);
1995 auto Cmp = B.buildICmp(CmpInst::ICMP_EQ, CCTy, Idx, IC);
1996 MRI.setRegBank(Cmp->getOperand(0).getReg(), CCBank);
1997
1998 for (unsigned L = 0; L < NumLanes; ++L) {
1999 auto S = B.buildSelect(EltTy, Cmp,
2000 UnmergeToEltTy.getReg(I * NumLanes + L), Res[L]);
2001
2002 for (unsigned N : { 0, 2, 3 })
2003 MRI.setRegBank(S->getOperand(N).getReg(), DstBank);
2004
2005 Res[L] = S->getOperand(0).getReg();
2006 }
2007 }
2008
2009 for (unsigned L = 0; L < NumLanes; ++L) {
2010 Register DstReg = (NumLanes == 1) ? MI.getOperand(0).getReg() : DstRegs[L];
2011 B.buildCopy(DstReg, Res[L]);
2012 MRI.setRegBank(DstReg, DstBank);
2013 }
2014
2015 MRI.setRegBank(MI.getOperand(0).getReg(), DstBank);
2016 MI.eraseFromParent();
2017
2018 return true;
2019}
2020
2021// Insert a cross regbank copy for a register if it already has a bank that
2022// differs from the one we want to set.
2025 const RegisterBank &Bank) {
2026 const RegisterBank *CurrBank = MRI.getRegBankOrNull(Reg);
2027 if (CurrBank && *CurrBank != Bank) {
2028 Register Copy = B.buildCopy(MRI.getType(Reg), Reg).getReg(0);
2029 MRI.setRegBank(Copy, Bank);
2030 return Copy;
2031 }
2032
2033 MRI.setRegBank(Reg, Bank);
2034 return Reg;
2035}
2036
2037bool AMDGPURegisterBankInfo::foldInsertEltToCmpSelect(
2039 const OperandsMapper &OpdMapper) const {
2040
2041 MachineRegisterInfo &MRI = *B.getMRI();
2042 Register VecReg = MI.getOperand(1).getReg();
2043 Register Idx = MI.getOperand(3).getReg();
2044
2045 const RegisterBank &IdxBank =
2046 *OpdMapper.getInstrMapping().getOperandMapping(3).BreakDown[0].RegBank;
2047
2048 bool IsDivergentIdx = IdxBank != AMDGPU::SGPRRegBank;
2049
2050 LLT VecTy = MRI.getType(VecReg);
2051 unsigned EltSize = VecTy.getScalarSizeInBits();
2052 unsigned NumElem = VecTy.getNumElements();
2053
2054 if (!SITargetLowering::shouldExpandVectorDynExt(EltSize, NumElem,
2055 IsDivergentIdx, &Subtarget))
2056 return false;
2057
2058 LLT S32 = LLT::scalar(32);
2059
2060 const RegisterBank &DstBank =
2061 *OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2062 const RegisterBank &SrcBank =
2063 *OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2064 const RegisterBank &InsBank =
2065 *OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
2066
2067 const RegisterBank &CCBank =
2068 (DstBank == AMDGPU::SGPRRegBank &&
2069 SrcBank == AMDGPU::SGPRRegBank &&
2070 InsBank == AMDGPU::SGPRRegBank &&
2071 IdxBank == AMDGPU::SGPRRegBank) ? AMDGPU::SGPRRegBank
2072 : AMDGPU::VCCRegBank;
2073 LLT CCTy = (CCBank == AMDGPU::SGPRRegBank) ? S32 : LLT::scalar(1);
2074
2075 if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank) {
2076 Idx = B.buildCopy(S32, Idx)->getOperand(0).getReg();
2077 MRI.setRegBank(Idx, AMDGPU::VGPRRegBank);
2078 }
2079
2080 LLT EltTy = VecTy.getScalarType();
2081 SmallVector<Register, 2> InsRegs(OpdMapper.getVRegs(2));
2082 unsigned NumLanes = InsRegs.size();
2083 if (!NumLanes) {
2084 NumLanes = 1;
2085 InsRegs.push_back(MI.getOperand(2).getReg());
2086 } else {
2087 EltTy = MRI.getType(InsRegs[0]);
2088 }
2089
2090 auto UnmergeToEltTy = B.buildUnmerge(EltTy, VecReg);
2091 SmallVector<Register, 16> Ops(NumElem * NumLanes);
2092
2093 for (unsigned I = 0; I < NumElem; ++I) {
2094 auto IC = B.buildConstant(S32, I);
2095 MRI.setRegBank(IC->getOperand(0).getReg(), AMDGPU::SGPRRegBank);
2096 auto Cmp = B.buildICmp(CmpInst::ICMP_EQ, CCTy, Idx, IC);
2097 MRI.setRegBank(Cmp->getOperand(0).getReg(), CCBank);
2098
2099 for (unsigned L = 0; L < NumLanes; ++L) {
2100 Register Op0 = constrainRegToBank(MRI, B, InsRegs[L], DstBank);
2101 Register Op1 = UnmergeToEltTy.getReg(I * NumLanes + L);
2102 Op1 = constrainRegToBank(MRI, B, Op1, DstBank);
2103
2104 Register Select = B.buildSelect(EltTy, Cmp, Op0, Op1).getReg(0);
2105 MRI.setRegBank(Select, DstBank);
2106
2107 Ops[I * NumLanes + L] = Select;
2108 }
2109 }
2110
2111 LLT MergeTy = LLT::fixed_vector(Ops.size(), EltTy);
2112 if (MergeTy == MRI.getType(MI.getOperand(0).getReg())) {
2113 B.buildBuildVector(MI.getOperand(0), Ops);
2114 } else {
2115 auto Vec = B.buildBuildVector(MergeTy, Ops);
2116 MRI.setRegBank(Vec->getOperand(0).getReg(), DstBank);
2117 B.buildBitcast(MI.getOperand(0).getReg(), Vec);
2118 }
2119
2120 MRI.setRegBank(MI.getOperand(0).getReg(), DstBank);
2121 MI.eraseFromParent();
2122
2123 return true;
2124}
2125
2126// Break s_mul_u64 into 32-bit vector operations.
2128 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
2129 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2130 SmallVector<Register, 2> Src0Regs(OpdMapper.getVRegs(1));
2131 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2132
2133 // All inputs are SGPRs, nothing special to do.
2134 if (DefRegs.empty()) {
2135 assert(Src0Regs.empty() && Src1Regs.empty());
2136 applyDefaultMapping(OpdMapper);
2137 return;
2138 }
2139
2140 assert(DefRegs.size() == 2);
2141 assert(Src0Regs.size() == Src1Regs.size() &&
2142 (Src0Regs.empty() || Src0Regs.size() == 2));
2143
2144 MachineRegisterInfo &MRI = OpdMapper.getMRI();
2145 MachineInstr &MI = OpdMapper.getMI();
2146 Register DstReg = MI.getOperand(0).getReg();
2147 LLT HalfTy = LLT::scalar(32);
2148
2149 // Depending on where the source registers came from, the generic code may
2150 // have decided to split the inputs already or not. If not, we still need to
2151 // extract the values.
2152
2153 if (Src0Regs.empty())
2154 split64BitValueForMapping(B, Src0Regs, HalfTy, MI.getOperand(1).getReg());
2155 else
2156 setRegsToType(MRI, Src0Regs, HalfTy);
2157
2158 if (Src1Regs.empty())
2159 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2160 else
2161 setRegsToType(MRI, Src1Regs, HalfTy);
2162
2163 setRegsToType(MRI, DefRegs, HalfTy);
2164
2165 // The multiplication is done as follows:
2166 //
2167 // Op1H Op1L
2168 // * Op0H Op0L
2169 // --------------------
2170 // Op1H*Op0L Op1L*Op0L
2171 // + Op1H*Op0H Op1L*Op0H
2172 // -----------------------------------------
2173 // (Op1H*Op0L + Op1L*Op0H + carry) Op1L*Op0L
2174 //
2175 // We drop Op1H*Op0H because the result of the multiplication is a 64-bit
2176 // value and that would overflow.
2177 // The low 32-bit value is Op1L*Op0L.
2178 // The high 32-bit value is Op1H*Op0L + Op1L*Op0H + carry (from
2179 // Op1L*Op0L).
2180
2181 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
2182
2183 Register Hi = B.buildUMulH(HalfTy, Src0Regs[0], Src1Regs[0]).getReg(0);
2184 Register MulLoHi = B.buildMul(HalfTy, Src0Regs[0], Src1Regs[1]).getReg(0);
2185 Register Add = B.buildAdd(HalfTy, Hi, MulLoHi).getReg(0);
2186 Register MulHiLo = B.buildMul(HalfTy, Src0Regs[1], Src1Regs[0]).getReg(0);
2187 B.buildAdd(DefRegs[1], Add, MulHiLo);
2188 B.buildMul(DefRegs[0], Src0Regs[0], Src1Regs[0]);
2189
2190 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2191 MI.eraseFromParent();
2192}
2193
2195 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
2196 MachineInstr &MI = OpdMapper.getMI();
2197 B.setInstrAndDebugLoc(MI);
2198 unsigned Opc = MI.getOpcode();
2199 MachineRegisterInfo &MRI = OpdMapper.getMRI();
2200 switch (Opc) {
2201 case AMDGPU::G_CONSTANT:
2202 case AMDGPU::G_IMPLICIT_DEF: {
2203 Register DstReg = MI.getOperand(0).getReg();
2204 LLT DstTy = MRI.getType(DstReg);
2205 if (DstTy != LLT::scalar(1))
2206 break;
2207
2208 const RegisterBank *DstBank =
2209 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2210 if (DstBank == &AMDGPU::VCCRegBank)
2211 break;
2212 SmallVector<Register, 1> DefRegs(OpdMapper.getVRegs(0));
2213 if (DefRegs.empty())
2214 DefRegs.push_back(DstReg);
2215
2216 B.setInsertPt(*MI.getParent(), ++MI.getIterator());
2217
2218 Register NewDstReg = MRI.createGenericVirtualRegister(LLT::scalar(32));
2219 LLVMContext &Ctx = B.getMF().getFunction().getContext();
2220
2221 MI.getOperand(0).setReg(NewDstReg);
2222 if (Opc != AMDGPU::G_IMPLICIT_DEF) {
2223 uint64_t ConstVal = MI.getOperand(1).getCImm()->getZExtValue();
2224 MI.getOperand(1).setCImm(
2225 ConstantInt::get(IntegerType::getInt32Ty(Ctx), ConstVal));
2226 }
2227
2228 MRI.setRegBank(NewDstReg, *DstBank);
2229 B.buildTrunc(DefRegs[0], NewDstReg);
2230 return;
2231 }
2232 case AMDGPU::G_PHI: {
2233 Register DstReg = MI.getOperand(0).getReg();
2234 LLT DstTy = MRI.getType(DstReg);
2235 if (DstTy != LLT::scalar(1))
2236 break;
2237
2238 const LLT S32 = LLT::scalar(32);
2239 const RegisterBank *DstBank =
2240 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2241 if (DstBank == &AMDGPU::VCCRegBank) {
2242 applyDefaultMapping(OpdMapper);
2243 // The standard handling only considers the result register bank for
2244 // phis. For VCC, blindly inserting a copy when the phi is lowered will
2245 // produce an invalid copy. We can only copy with some kind of compare to
2246 // get a vector boolean result. Insert a register bank copy that will be
2247 // correctly lowered to a compare.
2248 for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
2249 Register SrcReg = MI.getOperand(I).getReg();
2250 const RegisterBank *SrcBank = getRegBank(SrcReg, MRI, *TRI);
2251
2252 if (SrcBank != &AMDGPU::VCCRegBank) {
2253 MachineBasicBlock *SrcMBB = MI.getOperand(I + 1).getMBB();
2254 B.setInsertPt(*SrcMBB, SrcMBB->getFirstTerminator());
2255
2256 auto Copy = B.buildCopy(LLT::scalar(1), SrcReg);
2257 MRI.setRegBank(Copy.getReg(0), AMDGPU::VCCRegBank);
2258 MI.getOperand(I).setReg(Copy.getReg(0));
2259 }
2260 }
2261
2262 return;
2263 }
2264
2265 // Phi handling is strange and only considers the bank of the destination.
2266 substituteSimpleCopyRegs(OpdMapper, 0);
2267
2268 // Promote SGPR/VGPR booleans to s32
2269 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
2270 B.setInsertPt(B.getMBB(), MI);
2271 LegalizerHelper Helper(B.getMF(), ApplyBank, B);
2272
2273 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2274 llvm_unreachable("widen scalar should have succeeded");
2275
2276 return;
2277 }
2278 case AMDGPU::G_FCMP:
2279 if (!Subtarget.hasSALUFloatInsts())
2280 break;
2281 [[fallthrough]];
2282 case AMDGPU::G_ICMP:
2283 case AMDGPU::G_UADDO:
2284 case AMDGPU::G_USUBO:
2285 case AMDGPU::G_UADDE:
2286 case AMDGPU::G_SADDE:
2287 case AMDGPU::G_USUBE:
2288 case AMDGPU::G_SSUBE: {
2289 unsigned BoolDstOp =
2290 (Opc == AMDGPU::G_ICMP || Opc == AMDGPU::G_FCMP) ? 0 : 1;
2291 Register DstReg = MI.getOperand(BoolDstOp).getReg();
2292
2293 const RegisterBank *DstBank =
2294 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2295 if (DstBank != &AMDGPU::SGPRRegBank)
2296 break;
2297
2298 const bool HasCarryIn = MI.getNumOperands() == 5;
2299
2300 // If this is a scalar compare, promote the result to s32, as the selection
2301 // will end up using a copy to a 32-bit vreg.
2302 const LLT S32 = LLT::scalar(32);
2303 Register NewDstReg = MRI.createGenericVirtualRegister(S32);
2304 MRI.setRegBank(NewDstReg, AMDGPU::SGPRRegBank);
2305 MI.getOperand(BoolDstOp).setReg(NewDstReg);
2306
2307 if (HasCarryIn) {
2308 Register NewSrcReg = MRI.createGenericVirtualRegister(S32);
2309 MRI.setRegBank(NewSrcReg, AMDGPU::SGPRRegBank);
2310 B.buildZExt(NewSrcReg, MI.getOperand(4).getReg());
2311 MI.getOperand(4).setReg(NewSrcReg);
2312 }
2313
2314 MachineBasicBlock *MBB = MI.getParent();
2315 B.setInsertPt(*MBB, std::next(MI.getIterator()));
2316
2317 // If we had a constrained VCC result register, a copy was inserted to VCC
2318 // from SGPR.
2319 SmallVector<Register, 1> DefRegs(OpdMapper.getVRegs(0));
2320 if (DefRegs.empty())
2321 DefRegs.push_back(DstReg);
2322 B.buildTrunc(DefRegs[0], NewDstReg);
2323 return;
2324 }
2325 case AMDGPU::G_SELECT: {
2326 Register DstReg = MI.getOperand(0).getReg();
2327 LLT DstTy = MRI.getType(DstReg);
2328
2329 SmallVector<Register, 1> CondRegs(OpdMapper.getVRegs(1));
2330 if (CondRegs.empty())
2331 CondRegs.push_back(MI.getOperand(1).getReg());
2332 else {
2333 assert(CondRegs.size() == 1);
2334 }
2335
2336 const RegisterBank *CondBank = getRegBank(CondRegs[0], MRI, *TRI);
2337 if (CondBank == &AMDGPU::SGPRRegBank) {
2338 const LLT S32 = LLT::scalar(32);
2339 Register NewCondReg = MRI.createGenericVirtualRegister(S32);
2340 MRI.setRegBank(NewCondReg, AMDGPU::SGPRRegBank);
2341
2342 MI.getOperand(1).setReg(NewCondReg);
2343 B.buildZExt(NewCondReg, CondRegs[0]);
2344 }
2345
2346 if (DstTy.getSizeInBits() != 64)
2347 break;
2348
2349 LLT HalfTy = getHalfSizedType(DstTy);
2350
2351 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2352 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2353 SmallVector<Register, 2> Src2Regs(OpdMapper.getVRegs(3));
2354
2355 // All inputs are SGPRs, nothing special to do.
2356 if (DefRegs.empty()) {
2357 assert(Src1Regs.empty() && Src2Regs.empty());
2358 break;
2359 }
2360
2361 if (Src1Regs.empty())
2362 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2363 else {
2364 setRegsToType(MRI, Src1Regs, HalfTy);
2365 }
2366
2367 if (Src2Regs.empty())
2368 split64BitValueForMapping(B, Src2Regs, HalfTy, MI.getOperand(3).getReg());
2369 else
2370 setRegsToType(MRI, Src2Regs, HalfTy);
2371
2372 setRegsToType(MRI, DefRegs, HalfTy);
2373
2374 auto Flags = MI.getFlags();
2375 B.buildSelect(DefRegs[0], CondRegs[0], Src1Regs[0], Src2Regs[0], Flags);
2376 B.buildSelect(DefRegs[1], CondRegs[0], Src1Regs[1], Src2Regs[1], Flags);
2377
2378 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2379 MI.eraseFromParent();
2380 return;
2381 }
2382 case AMDGPU::G_BRCOND: {
2383 Register CondReg = MI.getOperand(0).getReg();
2384 // FIXME: Should use legalizer helper, but should change bool ext type.
2385 const RegisterBank *CondBank =
2386 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2387
2388 if (CondBank == &AMDGPU::SGPRRegBank) {
2389 const LLT S32 = LLT::scalar(32);
2390 Register NewCondReg = MRI.createGenericVirtualRegister(S32);
2391 MRI.setRegBank(NewCondReg, AMDGPU::SGPRRegBank);
2392
2393 MI.getOperand(0).setReg(NewCondReg);
2394 B.buildZExt(NewCondReg, CondReg);
2395 return;
2396 }
2397
2398 break;
2399 }
2400 case AMDGPU::G_AND:
2401 case AMDGPU::G_OR:
2402 case AMDGPU::G_XOR: {
2403 // 64-bit and is only available on the SALU, so split into 2 32-bit ops if
2404 // there is a VGPR input.
2405 Register DstReg = MI.getOperand(0).getReg();
2406 LLT DstTy = MRI.getType(DstReg);
2407
2408 const RegisterBank *DstBank =
2409 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2410
2411 if (DstTy.getSizeInBits() == 1) {
2412 if (DstBank == &AMDGPU::VCCRegBank)
2413 break;
2414
2415 MachineFunction *MF = MI.getParent()->getParent();
2416 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
2417 LegalizerHelper Helper(*MF, ApplyBank, B);
2418
2419 if (Helper.widenScalar(MI, 0, LLT::scalar(32)) !=
2421 llvm_unreachable("widen scalar should have succeeded");
2422 return;
2423 }
2424
2425 if (DstTy.getSizeInBits() == 16 && DstBank == &AMDGPU::SGPRRegBank) {
2426 const LLT S32 = LLT::scalar(32);
2427 MachineBasicBlock *MBB = MI.getParent();
2428 MachineFunction *MF = MBB->getParent();
2429 ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
2430 LegalizerHelper Helper(*MF, ApplySALU, B);
2431 // Widen to S32, but handle `G_XOR x, -1` differently. Legalizer widening
2432 // will use a G_ANYEXT to extend the -1 which prevents matching G_XOR -1
2433 // as "not".
2434 if (MI.getOpcode() == AMDGPU::G_XOR &&
2435 mi_match(MI.getOperand(2).getReg(), MRI, m_SpecificICstOrSplat(-1))) {
2436 Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
2437 Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_SEXT);
2438 Helper.widenScalarDst(MI, S32);
2439 } else {
2440 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2441 llvm_unreachable("widen scalar should have succeeded");
2442 }
2443 return;
2444 }
2445
2446 if (DstTy.getSizeInBits() != 64)
2447 break;
2448
2449 LLT HalfTy = getHalfSizedType(DstTy);
2450 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2451 SmallVector<Register, 2> Src0Regs(OpdMapper.getVRegs(1));
2452 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2453
2454 // All inputs are SGPRs, nothing special to do.
2455 if (DefRegs.empty()) {
2456 assert(Src0Regs.empty() && Src1Regs.empty());
2457 break;
2458 }
2459
2460 assert(DefRegs.size() == 2);
2461 assert(Src0Regs.size() == Src1Regs.size() &&
2462 (Src0Regs.empty() || Src0Regs.size() == 2));
2463
2464 // Depending on where the source registers came from, the generic code may
2465 // have decided to split the inputs already or not. If not, we still need to
2466 // extract the values.
2467
2468 if (Src0Regs.empty())
2469 split64BitValueForMapping(B, Src0Regs, HalfTy, MI.getOperand(1).getReg());
2470 else
2471 setRegsToType(MRI, Src0Regs, HalfTy);
2472
2473 if (Src1Regs.empty())
2474 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2475 else
2476 setRegsToType(MRI, Src1Regs, HalfTy);
2477
2478 setRegsToType(MRI, DefRegs, HalfTy);
2479
2480 auto Flags = MI.getFlags();
2481 B.buildInstr(Opc, {DefRegs[0]}, {Src0Regs[0], Src1Regs[0]}, Flags);
2482 B.buildInstr(Opc, {DefRegs[1]}, {Src0Regs[1], Src1Regs[1]}, Flags);
2483
2484 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2485 MI.eraseFromParent();
2486 return;
2487 }
2488 case AMDGPU::G_ABS: {
2489 Register SrcReg = MI.getOperand(1).getReg();
2490 const RegisterBank *SrcBank = MRI.getRegBankOrNull(SrcReg);
2491
2492 // There is no VALU abs instruction so we need to replace it with a sub and
2493 // max combination.
2494 if (SrcBank && SrcBank == &AMDGPU::VGPRRegBank) {
2495 MachineFunction *MF = MI.getParent()->getParent();
2496 ApplyRegBankMapping Apply(B, *this, MRI, &AMDGPU::VGPRRegBank);
2497 LegalizerHelper Helper(*MF, Apply, B);
2498
2500 llvm_unreachable("lowerAbsToMaxNeg should have succeeded");
2501 return;
2502 }
2503 [[fallthrough]];
2504 }
2505 case AMDGPU::G_ADD:
2506 case AMDGPU::G_SUB:
2507 case AMDGPU::G_MUL:
2508 case AMDGPU::G_SHL:
2509 case AMDGPU::G_LSHR:
2510 case AMDGPU::G_ASHR:
2511 case AMDGPU::G_SMIN:
2512 case AMDGPU::G_SMAX:
2513 case AMDGPU::G_UMIN:
2514 case AMDGPU::G_UMAX: {
2515 Register DstReg = MI.getOperand(0).getReg();
2516 LLT DstTy = MRI.getType(DstReg);
2517
2518 // Special case for s_mul_u64. There is not a vector equivalent of
2519 // s_mul_u64. Hence, we have to break down s_mul_u64 into 32-bit vector
2520 // multiplications.
2521 if (!Subtarget.hasVectorMulU64() && Opc == AMDGPU::G_MUL &&
2522 DstTy.getSizeInBits() == 64) {
2523 applyMappingSMULU64(B, OpdMapper);
2524 return;
2525 }
2526
2527 // 16-bit operations are VALU only, but can be promoted to 32-bit SALU.
2528 // Packed 16-bit operations need to be scalarized and promoted.
2529 if (DstTy != LLT::scalar(16) && DstTy != LLT::fixed_vector(2, 16))
2530 break;
2531
2532 const RegisterBank *DstBank =
2533 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2534 if (DstBank == &AMDGPU::VGPRRegBank)
2535 break;
2536
2537 const LLT S32 = LLT::scalar(32);
2538 MachineBasicBlock *MBB = MI.getParent();
2539 MachineFunction *MF = MBB->getParent();
2540 ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
2541
2542 if (DstTy.isVector() && Opc == AMDGPU::G_ABS) {
2543 Register WideSrcLo, WideSrcHi;
2544
2545 std::tie(WideSrcLo, WideSrcHi) =
2546 unpackV2S16ToS32(B, MI.getOperand(1).getReg(), TargetOpcode::G_SEXT);
2547 auto Lo = B.buildInstr(AMDGPU::G_ABS, {S32}, {WideSrcLo});
2548 auto Hi = B.buildInstr(AMDGPU::G_ABS, {S32}, {WideSrcHi});
2549 B.buildBuildVectorTrunc(DstReg, {Lo.getReg(0), Hi.getReg(0)});
2550 MI.eraseFromParent();
2551 return;
2552 }
2553
2554 if (DstTy.isVector()) {
2555 Register WideSrc0Lo, WideSrc0Hi;
2556 Register WideSrc1Lo, WideSrc1Hi;
2557
2558 unsigned ExtendOp = getExtendOp(MI.getOpcode());
2559 std::tie(WideSrc0Lo, WideSrc0Hi)
2560 = unpackV2S16ToS32(B, MI.getOperand(1).getReg(), ExtendOp);
2561 std::tie(WideSrc1Lo, WideSrc1Hi)
2562 = unpackV2S16ToS32(B, MI.getOperand(2).getReg(), ExtendOp);
2563 auto Lo = B.buildInstr(MI.getOpcode(), {S32}, {WideSrc0Lo, WideSrc1Lo});
2564 auto Hi = B.buildInstr(MI.getOpcode(), {S32}, {WideSrc0Hi, WideSrc1Hi});
2565 B.buildBuildVectorTrunc(DstReg, {Lo.getReg(0), Hi.getReg(0)});
2566 MI.eraseFromParent();
2567 } else {
2568 LegalizerHelper Helper(*MF, ApplySALU, B);
2569
2570 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2571 llvm_unreachable("widen scalar should have succeeded");
2572
2573 // FIXME: s16 shift amounts should be legal.
2574 if (Opc == AMDGPU::G_SHL || Opc == AMDGPU::G_LSHR ||
2575 Opc == AMDGPU::G_ASHR) {
2576 B.setInsertPt(*MBB, MI.getIterator());
2577 if (Helper.widenScalar(MI, 1, S32) != LegalizerHelper::Legalized)
2578 llvm_unreachable("widen scalar should have succeeded");
2579 }
2580 }
2581
2582 return;
2583 }
2584 case AMDGPU::G_AMDGPU_S_MUL_I64_I32:
2585 case AMDGPU::G_AMDGPU_S_MUL_U64_U32: {
2586 // This is a special case for s_mul_u64. We use
2587 // G_AMDGPU_S_MUL_I64_I32 opcode to represent an s_mul_u64 operation
2588 // where the 33 higher bits are sign-extended and
2589 // G_AMDGPU_S_MUL_U64_U32 opcode to represent an s_mul_u64 operation
2590 // where the 32 higher bits are zero-extended. In case scalar registers are
2591 // selected, both opcodes are lowered as s_mul_u64. If the vector registers
2592 // are selected, then G_AMDGPU_S_MUL_I64_I32 and
2593 // G_AMDGPU_S_MUL_U64_U32 are lowered with a vector mad instruction.
2594
2595 // Insert basic copies.
2596 applyDefaultMapping(OpdMapper);
2597
2598 Register DstReg = MI.getOperand(0).getReg();
2599 Register SrcReg0 = MI.getOperand(1).getReg();
2600 Register SrcReg1 = MI.getOperand(2).getReg();
2601 const LLT S32 = LLT::scalar(32);
2602 const LLT S64 = LLT::scalar(64);
2603 assert(MRI.getType(DstReg) == S64 && "This is a special case for s_mul_u64 "
2604 "that handles only 64-bit operands.");
2605 const RegisterBank *DstBank =
2606 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2607
2608 // Replace G_AMDGPU_S_MUL_I64_I32 and G_AMDGPU_S_MUL_U64_U32
2609 // with s_mul_u64 operation.
2610 if (DstBank == &AMDGPU::SGPRRegBank) {
2611 MI.setDesc(TII->get(AMDGPU::S_MUL_U64));
2612 MRI.setRegClass(DstReg, &AMDGPU::SGPR_64RegClass);
2613 MRI.setRegClass(SrcReg0, &AMDGPU::SGPR_64RegClass);
2614 MRI.setRegClass(SrcReg1, &AMDGPU::SGPR_64RegClass);
2615 return;
2616 }
2617
2618 // Replace G_AMDGPU_S_MUL_I64_I32 and G_AMDGPU_S_MUL_U64_U32
2619 // with a vector mad.
2620 assert(MRI.getRegBankOrNull(DstReg) == &AMDGPU::VGPRRegBank &&
2621 "The destination operand should be in vector registers.");
2622
2623 // Extract the lower subregister from the first operand.
2624 Register Op0L = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
2625 MRI.setRegClass(Op0L, &AMDGPU::VGPR_32RegClass);
2626 MRI.setType(Op0L, S32);
2627 B.buildTrunc(Op0L, SrcReg0);
2628
2629 // Extract the lower subregister from the second operand.
2630 Register Op1L = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
2631 MRI.setRegClass(Op1L, &AMDGPU::VGPR_32RegClass);
2632 MRI.setType(Op1L, S32);
2633 B.buildTrunc(Op1L, SrcReg1);
2634
2635 unsigned NewOpc = Opc == AMDGPU::G_AMDGPU_S_MUL_U64_U32
2636 ? AMDGPU::G_AMDGPU_MAD_U64_U32
2637 : AMDGPU::G_AMDGPU_MAD_I64_I32;
2638
2640 Register Zero64 = B.buildConstant(S64, 0).getReg(0);
2641 MRI.setRegClass(Zero64, &AMDGPU::VReg_64RegClass);
2642 Register CarryOut = MRI.createVirtualRegister(&AMDGPU::VReg_64RegClass);
2643 MRI.setRegClass(CarryOut, &AMDGPU::VReg_64RegClass);
2644 B.buildInstr(NewOpc, {DstReg, CarryOut}, {Op0L, Op1L, Zero64});
2645 MI.eraseFromParent();
2646 return;
2647 }
2648 case AMDGPU::G_SEXT_INREG: {
2649 SmallVector<Register, 2> SrcRegs(OpdMapper.getVRegs(1));
2650 if (SrcRegs.empty())
2651 break; // Nothing to repair
2652
2653 const LLT S32 = LLT::scalar(32);
2654 ApplyRegBankMapping O(B, *this, MRI, &AMDGPU::VGPRRegBank);
2655
2656 // Don't use LegalizerHelper's narrowScalar. It produces unwanted G_SEXTs
2657 // we would need to further expand, and doesn't let us directly set the
2658 // result registers.
2659 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
2660
2661 int Amt = MI.getOperand(2).getImm();
2662 if (Amt <= 32) {
2663 // Downstream users have expectations for the high bit behavior, so freeze
2664 // incoming undefined bits.
2665 if (Amt == 32) {
2666 // The low bits are unchanged.
2667 B.buildFreeze(DstRegs[0], SrcRegs[0]);
2668 } else {
2669 auto Freeze = B.buildFreeze(S32, SrcRegs[0]);
2670 // Extend in the low bits and propagate the sign bit to the high half.
2671 B.buildSExtInReg(DstRegs[0], Freeze, Amt);
2672 }
2673
2674 B.buildAShr(DstRegs[1], DstRegs[0], B.buildConstant(S32, 31));
2675 } else {
2676 // The low bits are unchanged, and extend in the high bits.
2677 // No freeze required
2678 B.buildCopy(DstRegs[0], SrcRegs[0]);
2679 B.buildSExtInReg(DstRegs[1], DstRegs[0], Amt - 32);
2680 }
2681
2682 Register DstReg = MI.getOperand(0).getReg();
2683 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2684 MI.eraseFromParent();
2685 return;
2686 }
2687 case AMDGPU::G_CTPOP:
2688 case AMDGPU::G_BITREVERSE: {
2689 const RegisterBank *DstBank =
2690 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2691 if (DstBank == &AMDGPU::SGPRRegBank)
2692 break;
2693
2694 Register SrcReg = MI.getOperand(1).getReg();
2695 const LLT S32 = LLT::scalar(32);
2696 LLT Ty = MRI.getType(SrcReg);
2697 if (Ty == S32)
2698 break;
2699
2700 ApplyRegBankMapping ApplyVALU(B, *this, MRI, &AMDGPU::VGPRRegBank);
2701
2702 MachineFunction &MF = B.getMF();
2703 LegalizerHelper Helper(MF, ApplyVALU, B);
2704
2705 if (Helper.narrowScalar(MI, 1, S32) != LegalizerHelper::Legalized)
2706 llvm_unreachable("narrowScalar should have succeeded");
2707 return;
2708 }
2709 case AMDGPU::G_AMDGPU_FFBH_U32:
2710 case AMDGPU::G_AMDGPU_FFBL_B32:
2711 case AMDGPU::G_CTLZ_ZERO_UNDEF:
2712 case AMDGPU::G_CTTZ_ZERO_UNDEF: {
2713 const RegisterBank *DstBank =
2714 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2715 if (DstBank == &AMDGPU::SGPRRegBank)
2716 break;
2717
2718 Register SrcReg = MI.getOperand(1).getReg();
2719 const LLT S32 = LLT::scalar(32);
2720 LLT Ty = MRI.getType(SrcReg);
2721 if (Ty == S32)
2722 break;
2723
2724 // We can narrow this more efficiently than Helper can by using ffbh/ffbl
2725 // which return -1 when the input is zero:
2726 // (ctlz_zero_undef hi:lo) -> (umin (ffbh hi), (add (ffbh lo), 32))
2727 // (cttz_zero_undef hi:lo) -> (umin (add (ffbl hi), 32), (ffbl lo))
2728 // (ffbh hi:lo) -> (umin (ffbh hi), (uaddsat (ffbh lo), 32))
2729 // (ffbl hi:lo) -> (umin (uaddsat (ffbh hi), 32), (ffbh lo))
2730 ApplyRegBankMapping ApplyVALU(B, *this, MRI, &AMDGPU::VGPRRegBank);
2731 SmallVector<Register, 2> SrcRegs(OpdMapper.getVRegs(1));
2732 unsigned NewOpc = Opc == AMDGPU::G_CTLZ_ZERO_UNDEF
2733 ? (unsigned)AMDGPU::G_AMDGPU_FFBH_U32
2734 : Opc == AMDGPU::G_CTTZ_ZERO_UNDEF
2735 ? (unsigned)AMDGPU::G_AMDGPU_FFBL_B32
2736 : Opc;
2737 unsigned Idx = NewOpc == AMDGPU::G_AMDGPU_FFBH_U32;
2738 auto X = B.buildInstr(NewOpc, {S32}, {SrcRegs[Idx]});
2739 auto Y = B.buildInstr(NewOpc, {S32}, {SrcRegs[Idx ^ 1]});
2740 unsigned AddOpc =
2741 Opc == AMDGPU::G_CTLZ_ZERO_UNDEF || Opc == AMDGPU::G_CTTZ_ZERO_UNDEF
2742 ? AMDGPU::G_ADD
2743 : AMDGPU::G_UADDSAT;
2744 Y = B.buildInstr(AddOpc, {S32}, {Y, B.buildConstant(S32, 32)});
2745 Register DstReg = MI.getOperand(0).getReg();
2746 B.buildUMin(DstReg, X, Y);
2747 MI.eraseFromParent();
2748 return;
2749 }
2750 case AMDGPU::G_SEXT:
2751 case AMDGPU::G_ZEXT:
2752 case AMDGPU::G_ANYEXT: {
2753 Register SrcReg = MI.getOperand(1).getReg();
2754 LLT SrcTy = MRI.getType(SrcReg);
2755 const bool Signed = Opc == AMDGPU::G_SEXT;
2756
2757 assert(OpdMapper.getVRegs(1).empty());
2758
2759 const RegisterBank *SrcBank =
2760 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2761
2762 Register DstReg = MI.getOperand(0).getReg();
2763 LLT DstTy = MRI.getType(DstReg);
2764 if (DstTy.isScalar() &&
2765 SrcBank != &AMDGPU::SGPRRegBank &&
2766 SrcBank != &AMDGPU::VCCRegBank &&
2767 // FIXME: Should handle any type that round to s64 when irregular
2768 // breakdowns supported.
2769 DstTy.getSizeInBits() == 64 &&
2770 SrcTy.getSizeInBits() <= 32) {
2771 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2772
2773 // Extend to 32-bit, and then extend the low half.
2774 if (Signed) {
2775 // TODO: Should really be buildSExtOrCopy
2776 B.buildSExtOrTrunc(DefRegs[0], SrcReg);
2777 } else if (Opc == AMDGPU::G_ZEXT) {
2778 B.buildZExtOrTrunc(DefRegs[0], SrcReg);
2779 } else {
2780 B.buildAnyExtOrTrunc(DefRegs[0], SrcReg);
2781 }
2782
2783 extendLow32IntoHigh32(B, DefRegs[1], DefRegs[0], Opc, *SrcBank);
2784 MRI.setRegBank(DstReg, *SrcBank);
2785 MI.eraseFromParent();
2786 return;
2787 }
2788
2789 if (SrcTy != LLT::scalar(1))
2790 return;
2791
2792 // It is not legal to have a legalization artifact with a VCC source. Rather
2793 // than introducing a copy, insert the select we would have to select the
2794 // copy to.
2795 if (SrcBank == &AMDGPU::VCCRegBank) {
2796 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2797
2798 const RegisterBank *DstBank = &AMDGPU::VGPRRegBank;
2799
2800 unsigned DstSize = DstTy.getSizeInBits();
2801 // 64-bit select is SGPR only
2802 const bool UseSel64 = DstSize > 32 &&
2803 SrcBank->getID() == AMDGPU::SGPRRegBankID;
2804
2805 // TODO: Should s16 select be legal?
2806 LLT SelType = UseSel64 ? LLT::scalar(64) : LLT::scalar(32);
2807 auto True = B.buildConstant(SelType, Signed ? -1 : 1);
2808 auto False = B.buildConstant(SelType, 0);
2809
2810 MRI.setRegBank(True.getReg(0), *DstBank);
2811 MRI.setRegBank(False.getReg(0), *DstBank);
2812 MRI.setRegBank(DstReg, *DstBank);
2813
2814 if (DstSize > 32) {
2815 B.buildSelect(DefRegs[0], SrcReg, True, False);
2816 extendLow32IntoHigh32(B, DefRegs[1], DefRegs[0], Opc, *SrcBank, true);
2817 } else if (DstSize < 32) {
2818 auto Sel = B.buildSelect(SelType, SrcReg, True, False);
2819 MRI.setRegBank(Sel.getReg(0), *DstBank);
2820 B.buildTrunc(DstReg, Sel);
2821 } else {
2822 B.buildSelect(DstReg, SrcReg, True, False);
2823 }
2824
2825 MI.eraseFromParent();
2826 return;
2827 }
2828
2829 break;
2830 }
2831 case AMDGPU::G_EXTRACT_VECTOR_ELT: {
2832 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
2833
2834 assert(OpdMapper.getVRegs(1).empty() && OpdMapper.getVRegs(2).empty());
2835
2836 Register DstReg = MI.getOperand(0).getReg();
2837 Register SrcReg = MI.getOperand(1).getReg();
2838
2839 const LLT S32 = LLT::scalar(32);
2840 LLT DstTy = MRI.getType(DstReg);
2841 LLT SrcTy = MRI.getType(SrcReg);
2842
2843 if (foldExtractEltToCmpSelect(B, MI, OpdMapper))
2844 return;
2845
2846 const ValueMapping &DstMapping
2847 = OpdMapper.getInstrMapping().getOperandMapping(0);
2848 const RegisterBank *DstBank = DstMapping.BreakDown[0].RegBank;
2849 const RegisterBank *SrcBank =
2850 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2851 const RegisterBank *IdxBank =
2852 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
2853
2854 Register BaseIdxReg;
2855 unsigned ConstOffset;
2856 std::tie(BaseIdxReg, ConstOffset) =
2857 AMDGPU::getBaseWithConstantOffset(MRI, MI.getOperand(2).getReg());
2858
2859 // See if the index is an add of a constant which will be foldable by moving
2860 // the base register of the index later if this is going to be executed in a
2861 // waterfall loop. This is essentially to reassociate the add of a constant
2862 // with the readfirstlane.
2863 bool ShouldMoveIndexIntoLoop = IdxBank != &AMDGPU::SGPRRegBank &&
2864 ConstOffset > 0 &&
2865 ConstOffset < SrcTy.getNumElements();
2866
2867 // Move the base register. We'll re-insert the add later.
2868 if (ShouldMoveIndexIntoLoop)
2869 MI.getOperand(2).setReg(BaseIdxReg);
2870
2871 // If this is a VGPR result only because the index was a VGPR result, the
2872 // actual indexing will be done on the SGPR source vector, which will
2873 // produce a scalar result. We need to copy to the VGPR result inside the
2874 // waterfall loop.
2875 const bool NeedCopyToVGPR = DstBank == &AMDGPU::VGPRRegBank &&
2876 SrcBank == &AMDGPU::SGPRRegBank;
2877 if (DstRegs.empty()) {
2878 applyDefaultMapping(OpdMapper);
2879
2881
2882 if (NeedCopyToVGPR) {
2883 // We don't want a phi for this temporary reg.
2884 Register TmpReg = MRI.createGenericVirtualRegister(DstTy);
2885 MRI.setRegBank(TmpReg, AMDGPU::SGPRRegBank);
2886 MI.getOperand(0).setReg(TmpReg);
2887 B.setInsertPt(*MI.getParent(), ++MI.getIterator());
2888
2889 // Use a v_mov_b32 here to make the exec dependency explicit.
2890 buildVCopy(B, DstReg, TmpReg);
2891 }
2892
2893 // Re-insert the constant offset add inside the waterfall loop.
2894 if (ShouldMoveIndexIntoLoop)
2895 reinsertVectorIndexAdd(B, MI, 2, ConstOffset);
2896
2897 return;
2898 }
2899
2900 assert(DstTy.getSizeInBits() == 64);
2901
2902 LLT Vec32 = LLT::fixed_vector(2 * SrcTy.getNumElements(), 32);
2903
2904 auto CastSrc = B.buildBitcast(Vec32, SrcReg);
2905 auto One = B.buildConstant(S32, 1);
2906
2907 MachineBasicBlock::iterator MII = MI.getIterator();
2908
2909 // Split the vector index into 32-bit pieces. Prepare to move all of the
2910 // new instructions into a waterfall loop if necessary.
2911 //
2912 // Don't put the bitcast or constant in the loop.
2913 MachineInstrSpan Span(MII, &B.getMBB());
2914
2915 // Compute 32-bit element indices, (2 * OrigIdx, 2 * OrigIdx + 1).
2916 auto IdxLo = B.buildShl(S32, BaseIdxReg, One);
2917 auto IdxHi = B.buildAdd(S32, IdxLo, One);
2918
2919 auto Extract0 = B.buildExtractVectorElement(DstRegs[0], CastSrc, IdxLo);
2920 auto Extract1 = B.buildExtractVectorElement(DstRegs[1], CastSrc, IdxHi);
2921
2922 MRI.setRegBank(DstReg, *DstBank);
2923 MRI.setRegBank(CastSrc.getReg(0), *SrcBank);
2924 MRI.setRegBank(One.getReg(0), AMDGPU::SGPRRegBank);
2925 MRI.setRegBank(IdxLo.getReg(0), AMDGPU::SGPRRegBank);
2926 MRI.setRegBank(IdxHi.getReg(0), AMDGPU::SGPRRegBank);
2927
2928 SmallSet<Register, 4> OpsToWaterfall;
2929 if (!collectWaterfallOperands(OpsToWaterfall, MI, MRI, { 2 })) {
2930 MI.eraseFromParent();
2931 return;
2932 }
2933
2934 // Remove the original instruction to avoid potentially confusing the
2935 // waterfall loop logic.
2936 B.setInstr(*Span.begin());
2937 MI.eraseFromParent();
2938 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
2939 OpsToWaterfall);
2940
2941 if (NeedCopyToVGPR) {
2942 MachineBasicBlock *LoopBB = Extract1->getParent();
2943 Register TmpReg0 = MRI.createGenericVirtualRegister(S32);
2944 Register TmpReg1 = MRI.createGenericVirtualRegister(S32);
2945 MRI.setRegBank(TmpReg0, AMDGPU::SGPRRegBank);
2946 MRI.setRegBank(TmpReg1, AMDGPU::SGPRRegBank);
2947
2948 Extract0->getOperand(0).setReg(TmpReg0);
2949 Extract1->getOperand(0).setReg(TmpReg1);
2950
2951 B.setInsertPt(*LoopBB, ++Extract1->getIterator());
2952
2953 buildVCopy(B, DstRegs[0], TmpReg0);
2954 buildVCopy(B, DstRegs[1], TmpReg1);
2955 }
2956
2957 if (ShouldMoveIndexIntoLoop)
2958 reinsertVectorIndexAdd(B, *IdxLo, 1, ConstOffset);
2959
2960 return;
2961 }
2962 case AMDGPU::G_INSERT_VECTOR_ELT: {
2963 SmallVector<Register, 2> InsRegs(OpdMapper.getVRegs(2));
2964
2965 Register DstReg = MI.getOperand(0).getReg();
2966 LLT VecTy = MRI.getType(DstReg);
2967
2968 assert(OpdMapper.getVRegs(0).empty());
2969 assert(OpdMapper.getVRegs(3).empty());
2970
2971 if (substituteSimpleCopyRegs(OpdMapper, 1))
2972 MRI.setType(MI.getOperand(1).getReg(), VecTy);
2973
2974 if (foldInsertEltToCmpSelect(B, MI, OpdMapper))
2975 return;
2976
2977 const RegisterBank *IdxBank =
2978 OpdMapper.getInstrMapping().getOperandMapping(3).BreakDown[0].RegBank;
2979
2980 Register SrcReg = MI.getOperand(1).getReg();
2981 Register InsReg = MI.getOperand(2).getReg();
2982 LLT InsTy = MRI.getType(InsReg);
2983 (void)InsTy;
2984
2985 Register BaseIdxReg;
2986 unsigned ConstOffset;
2987 std::tie(BaseIdxReg, ConstOffset) =
2988 AMDGPU::getBaseWithConstantOffset(MRI, MI.getOperand(3).getReg());
2989
2990 // See if the index is an add of a constant which will be foldable by moving
2991 // the base register of the index later if this is going to be executed in a
2992 // waterfall loop. This is essentially to reassociate the add of a constant
2993 // with the readfirstlane.
2994 bool ShouldMoveIndexIntoLoop = IdxBank != &AMDGPU::SGPRRegBank &&
2995 ConstOffset > 0 &&
2996 ConstOffset < VecTy.getNumElements();
2997
2998 // Move the base register. We'll re-insert the add later.
2999 if (ShouldMoveIndexIntoLoop)
3000 MI.getOperand(3).setReg(BaseIdxReg);
3001
3002
3003 if (InsRegs.empty()) {
3005
3006 // Re-insert the constant offset add inside the waterfall loop.
3007 if (ShouldMoveIndexIntoLoop) {
3008 reinsertVectorIndexAdd(B, MI, 3, ConstOffset);
3009 }
3010
3011 return;
3012 }
3013
3014 assert(InsTy.getSizeInBits() == 64);
3015
3016 const LLT S32 = LLT::scalar(32);
3017 LLT Vec32 = LLT::fixed_vector(2 * VecTy.getNumElements(), 32);
3018
3019 auto CastSrc = B.buildBitcast(Vec32, SrcReg);
3020 auto One = B.buildConstant(S32, 1);
3021
3022 // Split the vector index into 32-bit pieces. Prepare to move all of the
3023 // new instructions into a waterfall loop if necessary.
3024 //
3025 // Don't put the bitcast or constant in the loop.
3027
3028 // Compute 32-bit element indices, (2 * OrigIdx, 2 * OrigIdx + 1).
3029 auto IdxLo = B.buildShl(S32, BaseIdxReg, One);
3030 auto IdxHi = B.buildAdd(S32, IdxLo, One);
3031
3032 auto InsLo = B.buildInsertVectorElement(Vec32, CastSrc, InsRegs[0], IdxLo);
3033 auto InsHi = B.buildInsertVectorElement(Vec32, InsLo, InsRegs[1], IdxHi);
3034
3035 const RegisterBank *DstBank =
3036 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
3037 const RegisterBank *SrcBank =
3038 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
3039 const RegisterBank *InsSrcBank =
3040 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
3041
3042 MRI.setRegBank(InsReg, *InsSrcBank);
3043 MRI.setRegBank(CastSrc.getReg(0), *SrcBank);
3044 MRI.setRegBank(InsLo.getReg(0), *DstBank);
3045 MRI.setRegBank(InsHi.getReg(0), *DstBank);
3046 MRI.setRegBank(One.getReg(0), AMDGPU::SGPRRegBank);
3047 MRI.setRegBank(IdxLo.getReg(0), AMDGPU::SGPRRegBank);
3048 MRI.setRegBank(IdxHi.getReg(0), AMDGPU::SGPRRegBank);
3049
3050
3051 SmallSet<Register, 4> OpsToWaterfall;
3052 if (!collectWaterfallOperands(OpsToWaterfall, MI, MRI, { 3 })) {
3053 B.setInsertPt(B.getMBB(), MI);
3054 B.buildBitcast(DstReg, InsHi);
3055 MI.eraseFromParent();
3056 return;
3057 }
3058
3059 B.setInstr(*Span.begin());
3060 MI.eraseFromParent();
3061
3062 // Figure out the point after the waterfall loop before mangling the control
3063 // flow.
3064 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
3065 OpsToWaterfall);
3066
3067 // The insertion point is now right after the original instruction.
3068 //
3069 // Keep the bitcast to the original vector type out of the loop. Doing this
3070 // saved an extra phi we don't need inside the loop.
3071 B.buildBitcast(DstReg, InsHi);
3072
3073 // Re-insert the constant offset add inside the waterfall loop.
3074 if (ShouldMoveIndexIntoLoop)
3075 reinsertVectorIndexAdd(B, *IdxLo, 1, ConstOffset);
3076
3077 return;
3078 }
3079 case AMDGPU::G_AMDGPU_BUFFER_LOAD:
3080 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT:
3081 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT:
3082 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE:
3083 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE:
3084 case AMDGPU::G_AMDGPU_BUFFER_LOAD_TFE:
3085 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT_TFE:
3086 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT_TFE:
3087 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE_TFE:
3088 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE_TFE:
3089 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT:
3090 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_TFE:
3091 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_D16:
3092 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT:
3093 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT_D16:
3094 case AMDGPU::G_AMDGPU_BUFFER_STORE:
3095 case AMDGPU::G_AMDGPU_BUFFER_STORE_BYTE:
3096 case AMDGPU::G_AMDGPU_BUFFER_STORE_SHORT:
3097 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:
3098 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16:
3099 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:
3100 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16: {
3101 applyDefaultMapping(OpdMapper);
3102 executeInWaterfallLoop(B, MI, {1, 4});
3103 return;
3104 }
3105 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
3106 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
3107 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
3108 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
3109 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
3110 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
3111 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
3112 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
3113 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_OR:
3114 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_XOR:
3115 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_INC:
3116 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_DEC:
3117 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FADD:
3118 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMIN:
3119 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMAX: {
3120 applyDefaultMapping(OpdMapper);
3121 executeInWaterfallLoop(B, MI, {2, 5});
3122 return;
3123 }
3124 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_CMPSWAP: {
3125 applyDefaultMapping(OpdMapper);
3126 executeInWaterfallLoop(B, MI, {3, 6});
3127 return;
3128 }
3129 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
3130 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
3131 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
3132 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
3133 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT: {
3134 applyMappingSBufferLoad(B, OpdMapper);
3135 return;
3136 }
3137 case AMDGPU::G_AMDGPU_S_BUFFER_PREFETCH:
3140 return;
3141 case AMDGPU::G_INTRINSIC:
3142 case AMDGPU::G_INTRINSIC_CONVERGENT: {
3143 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
3144 case Intrinsic::amdgcn_readlane: {
3145 substituteSimpleCopyRegs(OpdMapper, 2);
3146
3147 assert(OpdMapper.getVRegs(0).empty());
3148 assert(OpdMapper.getVRegs(3).empty());
3149
3150 // Make sure the index is an SGPR. It doesn't make sense to run this in a
3151 // waterfall loop, so assume it's a uniform value.
3152 constrainOpWithReadfirstlane(B, MI, 3); // Index
3153 return;
3154 }
3155 case Intrinsic::amdgcn_writelane: {
3156 assert(OpdMapper.getVRegs(0).empty());
3157 assert(OpdMapper.getVRegs(2).empty());
3158 assert(OpdMapper.getVRegs(3).empty());
3159
3160 substituteSimpleCopyRegs(OpdMapper, 4); // VGPR input val
3161 constrainOpWithReadfirstlane(B, MI, 2); // Source value
3162 constrainOpWithReadfirstlane(B, MI, 3); // Index
3163 return;
3164 }
3165 case Intrinsic::amdgcn_interp_p1:
3166 case Intrinsic::amdgcn_interp_p2:
3167 case Intrinsic::amdgcn_interp_mov:
3168 case Intrinsic::amdgcn_interp_p1_f16:
3169 case Intrinsic::amdgcn_interp_p2_f16:
3170 case Intrinsic::amdgcn_lds_param_load: {
3171 applyDefaultMapping(OpdMapper);
3172
3173 // Readlane for m0 value, which is always the last operand.
3174 // FIXME: Should this be a waterfall loop instead?
3175 constrainOpWithReadfirstlane(B, MI, MI.getNumOperands() - 1); // Index
3176 return;
3177 }
3178 case Intrinsic::amdgcn_interp_inreg_p10:
3179 case Intrinsic::amdgcn_interp_inreg_p2:
3180 case Intrinsic::amdgcn_interp_inreg_p10_f16:
3181 case Intrinsic::amdgcn_interp_inreg_p2_f16:
3182 case Intrinsic::amdgcn_interp_p10_rtz_f16:
3183 case Intrinsic::amdgcn_interp_p2_rtz_f16:
3184 case Intrinsic::amdgcn_permlane16_swap:
3185 case Intrinsic::amdgcn_permlane32_swap:
3186 applyDefaultMapping(OpdMapper);
3187 return;
3188 case Intrinsic::amdgcn_permlane16:
3189 case Intrinsic::amdgcn_permlanex16: {
3190 // Doing a waterfall loop over these wouldn't make any sense.
3191 substituteSimpleCopyRegs(OpdMapper, 2);
3192 substituteSimpleCopyRegs(OpdMapper, 3);
3195 return;
3196 }
3197 case Intrinsic::amdgcn_permlane_bcast:
3198 case Intrinsic::amdgcn_permlane_up:
3199 case Intrinsic::amdgcn_permlane_down:
3200 case Intrinsic::amdgcn_permlane_xor:
3201 // Doing a waterfall loop over these wouldn't make any sense.
3204 return;
3205 case Intrinsic::amdgcn_permlane_idx_gen: {
3207 return;
3208 }
3209 case Intrinsic::amdgcn_sbfe:
3210 applyMappingBFE(B, OpdMapper, true);
3211 return;
3212 case Intrinsic::amdgcn_ubfe:
3213 applyMappingBFE(B, OpdMapper, false);
3214 return;
3215 case Intrinsic::amdgcn_inverse_ballot:
3216 case Intrinsic::amdgcn_s_bitreplicate:
3217 case Intrinsic::amdgcn_s_quadmask:
3218 case Intrinsic::amdgcn_s_wqm:
3219 applyDefaultMapping(OpdMapper);
3220 constrainOpWithReadfirstlane(B, MI, 2); // Mask
3221 return;
3222 case Intrinsic::amdgcn_ballot:
3223 // Use default handling and insert copy to vcc source.
3224 break;
3225 }
3226 break;
3227 }
3228 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:
3229 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_D16:
3230 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_NORET:
3231 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE:
3232 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE_D16: {
3233 const AMDGPU::RsrcIntrinsic *RSrcIntrin =
3235 assert(RSrcIntrin && RSrcIntrin->IsImage);
3236 // Non-images can have complications from operands that allow both SGPR
3237 // and VGPR. For now it's too complicated to figure out the final opcode
3238 // to derive the register bank from the MCInstrDesc.
3239 applyMappingImage(B, MI, OpdMapper, RSrcIntrin->RsrcArg);
3240 return;
3241 }
3242 case AMDGPU::G_AMDGPU_BVH_INTERSECT_RAY:
3243 case AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY:
3244 case AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY: {
3245 bool IsDualOrBVH8 =
3246 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY ||
3247 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY;
3248 unsigned NumMods = IsDualOrBVH8 ? 0 : 1; // Has A16 modifier
3249 unsigned LastRegOpIdx = MI.getNumExplicitOperands() - 1 - NumMods;
3250 applyDefaultMapping(OpdMapper);
3251 executeInWaterfallLoop(B, MI, {LastRegOpIdx});
3252 return;
3253 }
3254 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
3255 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS: {
3256 auto IntrID = cast<GIntrinsic>(MI).getIntrinsicID();
3257 switch (IntrID) {
3258 case Intrinsic::amdgcn_ds_ordered_add:
3259 case Intrinsic::amdgcn_ds_ordered_swap: {
3260 // This is only allowed to execute with 1 lane, so readfirstlane is safe.
3261 assert(OpdMapper.getVRegs(0).empty());
3262 substituteSimpleCopyRegs(OpdMapper, 3);
3264 return;
3265 }
3266 case Intrinsic::amdgcn_ds_gws_init:
3267 case Intrinsic::amdgcn_ds_gws_barrier:
3268 case Intrinsic::amdgcn_ds_gws_sema_br: {
3269 // Only the first lane is executes, so readfirstlane is safe.
3270 substituteSimpleCopyRegs(OpdMapper, 1);
3272 return;
3273 }
3274 case Intrinsic::amdgcn_ds_gws_sema_v:
3275 case Intrinsic::amdgcn_ds_gws_sema_p:
3276 case Intrinsic::amdgcn_ds_gws_sema_release_all: {
3277 // Only the first lane is executes, so readfirstlane is safe.
3279 return;
3280 }
3281 case Intrinsic::amdgcn_ds_append:
3282 case Intrinsic::amdgcn_ds_consume: {
3284 return;
3285 }
3286 case Intrinsic::amdgcn_s_sendmsg:
3287 case Intrinsic::amdgcn_s_sendmsghalt: {
3288 // FIXME: Should this use a waterfall loop?
3290 return;
3291 }
3292 case Intrinsic::amdgcn_s_setreg: {
3294 return;
3295 }
3296 case Intrinsic::amdgcn_s_ttracedata:
3298 return;
3299 case Intrinsic::amdgcn_raw_buffer_load_lds:
3300 case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: {
3301 applyDefaultMapping(OpdMapper);
3302 constrainOpWithReadfirstlane(B, MI, 1); // rsrc
3304 constrainOpWithReadfirstlane(B, MI, 5); // soffset
3305 return;
3306 }
3307 case Intrinsic::amdgcn_struct_buffer_load_lds:
3308 case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
3309 applyDefaultMapping(OpdMapper);
3310 constrainOpWithReadfirstlane(B, MI, 1); // rsrc
3312 constrainOpWithReadfirstlane(B, MI, 6); // soffset
3313 return;
3314 }
3315 case Intrinsic::amdgcn_cluster_load_async_to_lds_b8:
3316 case Intrinsic::amdgcn_cluster_load_async_to_lds_b32:
3317 case Intrinsic::amdgcn_cluster_load_async_to_lds_b64:
3318 case Intrinsic::amdgcn_cluster_load_async_to_lds_b128: {
3319 applyDefaultMapping(OpdMapper);
3321 return;
3322 }
3323 case Intrinsic::amdgcn_load_to_lds:
3324 case Intrinsic::amdgcn_global_load_lds: {
3325 applyDefaultMapping(OpdMapper);
3327 return;
3328 }
3329 case Intrinsic::amdgcn_lds_direct_load: {
3330 applyDefaultMapping(OpdMapper);
3331 // Readlane for m0 value, which is always the last operand.
3332 constrainOpWithReadfirstlane(B, MI, MI.getNumOperands() - 1); // Index
3333 return;
3334 }
3335 case Intrinsic::amdgcn_exp_row:
3336 applyDefaultMapping(OpdMapper);
3338 return;
3339 case Intrinsic::amdgcn_cluster_load_b32:
3340 case Intrinsic::amdgcn_cluster_load_b64:
3341 case Intrinsic::amdgcn_cluster_load_b128: {
3342 applyDefaultMapping(OpdMapper);
3344 return;
3345 }
3346 case Intrinsic::amdgcn_s_sleep_var:
3347 assert(OpdMapper.getVRegs(1).empty());
3349 return;
3350 case Intrinsic::amdgcn_s_barrier_join:
3352 return;
3353 case Intrinsic::amdgcn_s_barrier_init:
3354 case Intrinsic::amdgcn_s_barrier_signal_var:
3357 return;
3358 case Intrinsic::amdgcn_s_get_barrier_state:
3359 case Intrinsic::amdgcn_s_get_named_barrier_state: {
3361 return;
3362 }
3363 case Intrinsic::amdgcn_s_prefetch_data: {
3364 Register PtrReg = MI.getOperand(1).getReg();
3365 unsigned AS = MRI.getType(PtrReg).getAddressSpace();
3369 } else
3370 MI.eraseFromParent();
3371 return;
3372 }
3373 case Intrinsic::amdgcn_tensor_load_to_lds:
3374 case Intrinsic::amdgcn_tensor_store_from_lds: {
3379 return;
3380 }
3381 case Intrinsic::amdgcn_tensor_load_to_lds_d2:
3382 case Intrinsic::amdgcn_tensor_store_from_lds_d2: {
3385 return;
3386 }
3387 default: {
3388 if (const AMDGPU::RsrcIntrinsic *RSrcIntrin =
3390 // Non-images can have complications from operands that allow both SGPR
3391 // and VGPR. For now it's too complicated to figure out the final opcode
3392 // to derive the register bank from the MCInstrDesc.
3393 if (RSrcIntrin->IsImage) {
3394 applyMappingImage(B, MI, OpdMapper, RSrcIntrin->RsrcArg);
3395 return;
3396 }
3397 }
3398
3399 break;
3400 }
3401 }
3402 break;
3403 }
3404 case AMDGPU::G_SI_CALL: {
3405 // Use a set to avoid extra readfirstlanes in the case where multiple
3406 // operands are the same register.
3407 SmallSet<Register, 4> SGPROperandRegs;
3408
3409 if (!collectWaterfallOperands(SGPROperandRegs, MI, MRI, {1}))
3410 break;
3411
3412 // Move all copies to physical SGPRs that are used by the call instruction
3413 // into the loop block. Start searching for these copies until the
3414 // ADJCALLSTACKUP.
3415 unsigned FrameSetupOpcode = AMDGPU::ADJCALLSTACKUP;
3416 unsigned FrameDestroyOpcode = AMDGPU::ADJCALLSTACKDOWN;
3417
3418 // Move all non-copies before the copies, so that a complete range can be
3419 // moved into the waterfall loop.
3420 SmallVector<MachineInstr *, 4> NonCopyInstrs;
3421 // Count of NonCopyInstrs found until the current LastCopy.
3422 unsigned NonCopyInstrsLen = 0;
3424 MachineBasicBlock::iterator LastCopy = Start;
3425 MachineBasicBlock *MBB = MI.getParent();
3426 const SIMachineFunctionInfo *Info =
3427 MBB->getParent()->getInfo<SIMachineFunctionInfo>();
3428 while (Start->getOpcode() != FrameSetupOpcode) {
3429 --Start;
3430 bool IsCopy = false;
3431 if (Start->getOpcode() == AMDGPU::COPY) {
3432 auto &Dst = Start->getOperand(0);
3433 if (Dst.isReg()) {
3434 Register Reg = Dst.getReg();
3435 if (Reg.isPhysical() && MI.readsRegister(Reg, TRI)) {
3436 IsCopy = true;
3437 } else {
3438 // Also move the copy from the scratch rsrc descriptor into the loop
3439 // to allow it to be optimized away.
3440 auto &Src = Start->getOperand(1);
3441 if (Src.isReg()) {
3442 Reg = Src.getReg();
3443 IsCopy = Info->getScratchRSrcReg() == Reg;
3444 }
3445 }
3446 }
3447 }
3448
3449 if (IsCopy) {
3450 LastCopy = Start;
3451 NonCopyInstrsLen = NonCopyInstrs.size();
3452 } else {
3453 NonCopyInstrs.push_back(&*Start);
3454 }
3455 }
3456 NonCopyInstrs.resize(NonCopyInstrsLen);
3457
3458 for (auto *NonCopy : reverse(NonCopyInstrs)) {
3459 MBB->splice(LastCopy, MBB, NonCopy->getIterator());
3460 }
3461 Start = LastCopy;
3462
3463 // Do the same for copies after the loop
3464 NonCopyInstrs.clear();
3465 NonCopyInstrsLen = 0;
3467 LastCopy = End;
3468 while (End->getOpcode() != FrameDestroyOpcode) {
3469 ++End;
3470 bool IsCopy = false;
3471 if (End->getOpcode() == AMDGPU::COPY) {
3472 auto &Src = End->getOperand(1);
3473 if (Src.isReg()) {
3474 Register Reg = Src.getReg();
3475 IsCopy = Reg.isPhysical() && MI.modifiesRegister(Reg, TRI);
3476 }
3477 }
3478
3479 if (IsCopy) {
3480 LastCopy = End;
3481 NonCopyInstrsLen = NonCopyInstrs.size();
3482 } else {
3483 NonCopyInstrs.push_back(&*End);
3484 }
3485 }
3486 NonCopyInstrs.resize(NonCopyInstrsLen);
3487
3488 End = LastCopy;
3489 ++LastCopy;
3490 for (auto *NonCopy : reverse(NonCopyInstrs)) {
3491 MBB->splice(LastCopy, MBB, NonCopy->getIterator());
3492 }
3493
3494 ++End;
3495 B.setInsertPt(B.getMBB(), Start);
3496 executeInWaterfallLoop(B, make_range(Start, End), SGPROperandRegs);
3497 break;
3498 }
3499 case AMDGPU::G_LOAD:
3500 case AMDGPU::G_ZEXTLOAD:
3501 case AMDGPU::G_SEXTLOAD: {
3502 if (applyMappingLoad(B, OpdMapper, MI))
3503 return;
3504 break;
3505 }
3506 case AMDGPU::G_DYN_STACKALLOC:
3507 applyMappingDynStackAlloc(B, OpdMapper, MI);
3508 return;
3509 case AMDGPU::G_STACKRESTORE: {
3510 applyDefaultMapping(OpdMapper);
3512 return;
3513 }
3514 case AMDGPU::G_SBFX:
3515 applyMappingBFE(B, OpdMapper, /*Signed*/ true);
3516 return;
3517 case AMDGPU::G_UBFX:
3518 applyMappingBFE(B, OpdMapper, /*Signed*/ false);
3519 return;
3520 case AMDGPU::G_AMDGPU_MAD_U64_U32:
3521 case AMDGPU::G_AMDGPU_MAD_I64_I32:
3522 applyMappingMAD_64_32(B, OpdMapper);
3523 return;
3524 case AMDGPU::G_PREFETCH: {
3525 if (!Subtarget.hasSafeSmemPrefetch() && !Subtarget.hasVmemPrefInsts()) {
3526 MI.eraseFromParent();
3527 return;
3528 }
3529 Register PtrReg = MI.getOperand(0).getReg();
3530 unsigned PtrBank = getRegBankID(PtrReg, MRI, AMDGPU::SGPRRegBankID);
3531 if (PtrBank == AMDGPU::VGPRRegBankID &&
3532 (!Subtarget.hasVmemPrefInsts() || !MI.getOperand(3).getImm())) {
3533 // Cannot do I$ prefetch with divergent pointer.
3534 MI.eraseFromParent();
3535 return;
3536 }
3537 unsigned AS = MRI.getType(PtrReg).getAddressSpace();
3540 (!Subtarget.hasSafeSmemPrefetch() &&
3542 !MI.getOperand(3).getImm() /* I$ prefetch */))) {
3543 MI.eraseFromParent();
3544 return;
3545 }
3546 applyDefaultMapping(OpdMapper);
3547 return;
3548 }
3549 default:
3550 break;
3551 }
3552
3553 return applyDefaultMapping(OpdMapper);
3554}
3555
3556// vgpr, sgpr -> vgpr
3557// vgpr, agpr -> vgpr
3558// agpr, agpr -> agpr
3559// agpr, sgpr -> vgpr
3560static unsigned regBankUnion(unsigned RB0, unsigned RB1) {
3561 if (RB0 == AMDGPU::InvalidRegBankID)
3562 return RB1;
3563 if (RB1 == AMDGPU::InvalidRegBankID)
3564 return RB0;
3565
3566 if (RB0 == AMDGPU::SGPRRegBankID && RB1 == AMDGPU::SGPRRegBankID)
3567 return AMDGPU::SGPRRegBankID;
3568
3569 if (RB0 == AMDGPU::AGPRRegBankID && RB1 == AMDGPU::AGPRRegBankID)
3570 return AMDGPU::AGPRRegBankID;
3571
3572 return AMDGPU::VGPRRegBankID;
3573}
3574
3575static unsigned regBankBoolUnion(unsigned RB0, unsigned RB1) {
3576 if (RB0 == AMDGPU::InvalidRegBankID)
3577 return RB1;
3578 if (RB1 == AMDGPU::InvalidRegBankID)
3579 return RB0;
3580
3581 // vcc, vcc -> vcc
3582 // vcc, sgpr -> vcc
3583 // vcc, vgpr -> vcc
3584 if (RB0 == AMDGPU::VCCRegBankID || RB1 == AMDGPU::VCCRegBankID)
3585 return AMDGPU::VCCRegBankID;
3586
3587 // vcc, vgpr -> vgpr
3588 return regBankUnion(RB0, RB1);
3589}
3590
3592 const MachineInstr &MI) const {
3593 unsigned RegBank = AMDGPU::InvalidRegBankID;
3594
3595 for (const MachineOperand &MO : MI.operands()) {
3596 if (!MO.isReg())
3597 continue;
3598 Register Reg = MO.getReg();
3599 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
3600 RegBank = regBankUnion(RegBank, Bank->getID());
3601 if (RegBank == AMDGPU::VGPRRegBankID)
3602 break;
3603 }
3604 }
3605
3606 return RegBank;
3607}
3608
3610 const MachineFunction &MF = *MI.getParent()->getParent();
3611 const MachineRegisterInfo &MRI = MF.getRegInfo();
3612 for (const MachineOperand &MO : MI.operands()) {
3613 if (!MO.isReg())
3614 continue;
3615 Register Reg = MO.getReg();
3616 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
3617 if (Bank->getID() != AMDGPU::SGPRRegBankID)
3618 return false;
3619 }
3620 }
3621 return true;
3622}
3623
3626 const MachineFunction &MF = *MI.getParent()->getParent();
3627 const MachineRegisterInfo &MRI = MF.getRegInfo();
3628 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3629
3630 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
3631 const MachineOperand &SrcOp = MI.getOperand(i);
3632 if (!SrcOp.isReg())
3633 continue;
3634
3635 unsigned Size = getSizeInBits(SrcOp.getReg(), MRI, *TRI);
3636 OpdsMapping[i] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3637 }
3638 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3639 MI.getNumOperands());
3640}
3641
3644 const MachineFunction &MF = *MI.getParent()->getParent();
3645 const MachineRegisterInfo &MRI = MF.getRegInfo();
3646 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3647
3648 // Even though we technically could use SGPRs, this would require knowledge of
3649 // the constant bus restriction. Force all sources to VGPR (except for VCC).
3650 //
3651 // TODO: Unary ops are trivially OK, so accept SGPRs?
3652 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
3653 const MachineOperand &Src = MI.getOperand(i);
3654 if (!Src.isReg())
3655 continue;
3656
3657 unsigned Size = getSizeInBits(Src.getReg(), MRI, *TRI);
3658 unsigned BankID = Size == 1 ? AMDGPU::VCCRegBankID : AMDGPU::VGPRRegBankID;
3659 OpdsMapping[i] = AMDGPU::getValueMapping(BankID, Size);
3660 }
3661
3662 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3663 MI.getNumOperands());
3664}
3665
3668 const MachineFunction &MF = *MI.getParent()->getParent();
3669 const MachineRegisterInfo &MRI = MF.getRegInfo();
3670 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3671
3672 for (unsigned I = 0, E = MI.getNumOperands(); I != E; ++I) {
3673 const MachineOperand &Op = MI.getOperand(I);
3674 if (!Op.isReg())
3675 continue;
3676
3677 unsigned Size = getSizeInBits(Op.getReg(), MRI, *TRI);
3678 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3679 }
3680
3681 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3682 MI.getNumOperands());
3683}
3684
3687 const MachineInstr &MI,
3688 int RsrcIdx) const {
3689 // The reported argument index is relative to the IR intrinsic call arguments,
3690 // so we need to shift by the number of defs and the intrinsic ID.
3691 RsrcIdx += MI.getNumExplicitDefs() + 1;
3692
3693 const int NumOps = MI.getNumOperands();
3695
3696 // TODO: Should packed/unpacked D16 difference be reported here as part of
3697 // the value mapping?
3698 for (int I = 0; I != NumOps; ++I) {
3699 if (!MI.getOperand(I).isReg())
3700 continue;
3701
3702 Register OpReg = MI.getOperand(I).getReg();
3703 // We replace some dead address operands with $noreg
3704 if (!OpReg)
3705 continue;
3706
3707 unsigned Size = getSizeInBits(OpReg, MRI, *TRI);
3708
3709 // FIXME: Probably need a new intrinsic register bank searchable table to
3710 // handle arbitrary intrinsics easily.
3711 //
3712 // If this has a sampler, it immediately follows rsrc.
3713 const bool MustBeSGPR = I == RsrcIdx || I == RsrcIdx + 1;
3714
3715 if (MustBeSGPR) {
3716 // If this must be an SGPR, so we must report whatever it is as legal.
3717 unsigned NewBank = getRegBankID(OpReg, MRI, AMDGPU::SGPRRegBankID);
3718 OpdsMapping[I] = AMDGPU::getValueMapping(NewBank, Size);
3719 } else {
3720 // Some operands must be VGPR, and these are easy to copy to.
3721 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3722 }
3723 }
3724
3725 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping), NumOps);
3726}
3727
3728/// Return the mapping for a pointer argument.
3731 Register PtrReg) const {
3732 LLT PtrTy = MRI.getType(PtrReg);
3733 unsigned Size = PtrTy.getSizeInBits();
3734 if (Subtarget.useFlatForGlobal() ||
3736 return AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3737
3738 // If we're using MUBUF instructions for global memory, an SGPR base register
3739 // is possible. Otherwise this needs to be a VGPR.
3740 const RegisterBank *PtrBank = getRegBank(PtrReg, MRI, *TRI);
3741 return AMDGPU::getValueMapping(PtrBank->getID(), Size);
3742}
3743
3746
3747 const MachineFunction &MF = *MI.getParent()->getParent();
3748 const MachineRegisterInfo &MRI = MF.getRegInfo();
3750 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3751 Register PtrReg = MI.getOperand(1).getReg();
3752 LLT PtrTy = MRI.getType(PtrReg);
3753 unsigned AS = PtrTy.getAddressSpace();
3754 unsigned PtrSize = PtrTy.getSizeInBits();
3755
3756 const ValueMapping *ValMapping;
3757 const ValueMapping *PtrMapping;
3758
3759 const RegisterBank *PtrBank = getRegBank(PtrReg, MRI, *TRI);
3760
3761 if (PtrBank == &AMDGPU::SGPRRegBank && AMDGPU::isFlatGlobalAddrSpace(AS)) {
3762 if (isScalarLoadLegal(MI)) {
3763 // We have a uniform instruction so we want to use an SMRD load
3764 ValMapping = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3765 PtrMapping = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, PtrSize);
3766 } else {
3767 ValMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3768
3769 // If we're using MUBUF instructions for global memory, an SGPR base
3770 // register is possible. Otherwise this needs to be a VGPR.
3771 unsigned PtrBankID = Subtarget.useFlatForGlobal() ?
3772 AMDGPU::VGPRRegBankID : AMDGPU::SGPRRegBankID;
3773
3774 PtrMapping = AMDGPU::getValueMapping(PtrBankID, PtrSize);
3775 }
3776 } else {
3777 ValMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3778 PtrMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize);
3779 }
3780
3781 OpdsMapping[0] = ValMapping;
3782 OpdsMapping[1] = PtrMapping;
3784 1, 1, getOperandsMapping(OpdsMapping), MI.getNumOperands());
3785 return Mapping;
3786
3787 // FIXME: Do we want to add a mapping for FLAT load, or should we just
3788 // handle that during instruction selection?
3789}
3790
3791unsigned
3793 const MachineRegisterInfo &MRI,
3794 unsigned Default) const {
3795 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
3796 return Bank ? Bank->getID() : Default;
3797}
3798
3801 const MachineRegisterInfo &MRI,
3802 const TargetRegisterInfo &TRI) const {
3803 // Lie and claim anything is legal, even though this needs to be an SGPR
3804 // applyMapping will have to deal with it as a waterfall loop.
3805 unsigned Bank = getRegBankID(Reg, MRI, AMDGPU::SGPRRegBankID);
3806 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3807 return AMDGPU::getValueMapping(Bank, Size);
3808}
3809
3812 const MachineRegisterInfo &MRI,
3813 const TargetRegisterInfo &TRI) const {
3814 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3815 return AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3816}
3817
3820 const MachineRegisterInfo &MRI,
3821 const TargetRegisterInfo &TRI) const {
3822 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3823 return AMDGPU::getValueMapping(AMDGPU::AGPRRegBankID, Size);
3824}
3825
3826///
3827/// This function must return a legal mapping, because
3828/// AMDGPURegisterBankInfo::getInstrAlternativeMappings() is not called
3829/// in RegBankSelect::Mode::Fast. Any mapping that would cause a
3830/// VGPR to SGPR generated is illegal.
3831///
3832// Operands that must be SGPRs must accept potentially divergent VGPRs as
3833// legal. These will be dealt with in applyMappingImpl.
3834//
3837 const MachineFunction &MF = *MI.getParent()->getParent();
3838 const MachineRegisterInfo &MRI = MF.getRegInfo();
3839
3840 if (MI.isCopy() || MI.getOpcode() == AMDGPU::G_FREEZE) {
3841 Register DstReg = MI.getOperand(0).getReg();
3842 Register SrcReg = MI.getOperand(1).getReg();
3843
3844 // The default logic bothers to analyze impossible alternative mappings. We
3845 // want the most straightforward mapping, so just directly handle this.
3846 const RegisterBank *DstBank = getRegBank(DstReg, MRI, *TRI);
3847 const RegisterBank *SrcBank = getRegBank(SrcReg, MRI, *TRI);
3848
3849 // For COPY between a physical reg and an s1, there is no type associated so
3850 // we need to take the virtual register's type as a hint on how to interpret
3851 // s1 values.
3852 unsigned Size;
3853 if (!SrcReg.isVirtual() && !DstBank &&
3854 MRI.getType(DstReg) == LLT::scalar(1)) {
3855 DstBank = &AMDGPU::VCCRegBank;
3856 Size = 1;
3857 } else if (!DstReg.isVirtual() && MRI.getType(SrcReg) == LLT::scalar(1)) {
3858 DstBank = &AMDGPU::VCCRegBank;
3859 Size = 1;
3860 } else {
3861 Size = getSizeInBits(DstReg, MRI, *TRI);
3862 }
3863
3864 if (!DstBank)
3865 DstBank = SrcBank;
3866 else if (!SrcBank)
3867 SrcBank = DstBank;
3868
3869 if (MI.getOpcode() != AMDGPU::G_FREEZE &&
3870 cannotCopy(*DstBank, *SrcBank, TypeSize::getFixed(Size)))
3872
3873 const ValueMapping &ValMap = getValueMapping(0, Size, *DstBank);
3874 unsigned OpdsMappingSize = MI.isCopy() ? 1 : 2;
3875 SmallVector<const ValueMapping *, 1> OpdsMapping(OpdsMappingSize);
3876 OpdsMapping[0] = &ValMap;
3877 if (MI.getOpcode() == AMDGPU::G_FREEZE)
3878 OpdsMapping[1] = &ValMap;
3879
3880 return getInstructionMapping(
3881 1, /*Cost*/ 1,
3882 /*OperandsMapping*/ getOperandsMapping(OpdsMapping), OpdsMappingSize);
3883 }
3884
3885 if (MI.isRegSequence()) {
3886 // If any input is a VGPR, the result must be a VGPR. The default handling
3887 // assumes any copy between banks is legal.
3888 unsigned BankID = AMDGPU::SGPRRegBankID;
3889
3890 for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
3891 auto OpBank = getRegBankID(MI.getOperand(I).getReg(), MRI);
3892 // It doesn't make sense to use vcc or scc banks here, so just ignore
3893 // them.
3894 if (OpBank != AMDGPU::SGPRRegBankID) {
3895 BankID = AMDGPU::VGPRRegBankID;
3896 break;
3897 }
3898 }
3899 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3900
3901 const ValueMapping &ValMap = getValueMapping(0, Size, getRegBank(BankID));
3902 return getInstructionMapping(
3903 1, /*Cost*/ 1,
3904 /*OperandsMapping*/ getOperandsMapping({&ValMap}), 1);
3905 }
3906
3907 // The default handling is broken and doesn't handle illegal SGPR->VGPR copies
3908 // properly.
3909 //
3910 // TODO: There are additional exec masking dependencies to analyze.
3911 if (auto *PHI = dyn_cast<GPhi>(&MI)) {
3912 unsigned ResultBank = AMDGPU::InvalidRegBankID;
3913 Register DstReg = PHI->getReg(0);
3914
3915 // Sometimes the result may have already been assigned a bank.
3916 if (const RegisterBank *DstBank = getRegBank(DstReg, MRI, *TRI))
3917 ResultBank = DstBank->getID();
3918
3919 for (unsigned I = 0; I < PHI->getNumIncomingValues(); ++I) {
3920 Register Reg = PHI->getIncomingValue(I);
3921 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
3922
3923 // FIXME: Assuming VGPR for any undetermined inputs.
3924 if (!Bank || Bank->getID() == AMDGPU::VGPRRegBankID) {
3925 ResultBank = AMDGPU::VGPRRegBankID;
3926 break;
3927 }
3928
3929 // FIXME: Need to promote SGPR case to s32
3930 unsigned OpBank = Bank->getID();
3931 ResultBank = regBankBoolUnion(ResultBank, OpBank);
3932 }
3933
3934 assert(ResultBank != AMDGPU::InvalidRegBankID);
3935
3936 unsigned Size = MRI.getType(DstReg).getSizeInBits();
3937
3938 const ValueMapping &ValMap =
3939 getValueMapping(0, Size, getRegBank(ResultBank));
3940 return getInstructionMapping(
3941 1, /*Cost*/ 1,
3942 /*OperandsMapping*/ getOperandsMapping({&ValMap}), 1);
3943 }
3944
3946 if (Mapping.isValid())
3947 return Mapping;
3948
3949 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3950
3951 switch (MI.getOpcode()) {
3952 default:
3954
3955 case AMDGPU::G_AND:
3956 case AMDGPU::G_OR:
3957 case AMDGPU::G_XOR:
3958 case AMDGPU::G_MUL: {
3959 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
3960 if (Size == 1) {
3961 const RegisterBank *DstBank
3962 = getRegBank(MI.getOperand(0).getReg(), MRI, *TRI);
3963
3964 unsigned TargetBankID = AMDGPU::InvalidRegBankID;
3965 unsigned BankLHS = AMDGPU::InvalidRegBankID;
3966 unsigned BankRHS = AMDGPU::InvalidRegBankID;
3967 if (DstBank) {
3968 TargetBankID = DstBank->getID();
3969 if (DstBank == &AMDGPU::VCCRegBank) {
3970 TargetBankID = AMDGPU::VCCRegBankID;
3971 BankLHS = AMDGPU::VCCRegBankID;
3972 BankRHS = AMDGPU::VCCRegBankID;
3973 } else {
3974 BankLHS = getRegBankID(MI.getOperand(1).getReg(), MRI,
3975 AMDGPU::SGPRRegBankID);
3976 BankRHS = getRegBankID(MI.getOperand(2).getReg(), MRI,
3977 AMDGPU::SGPRRegBankID);
3978 }
3979 } else {
3980 BankLHS = getRegBankID(MI.getOperand(1).getReg(), MRI,
3981 AMDGPU::VCCRegBankID);
3982 BankRHS = getRegBankID(MI.getOperand(2).getReg(), MRI,
3983 AMDGPU::VCCRegBankID);
3984
3985 // Both inputs should be true booleans to produce a boolean result.
3986 if (BankLHS == AMDGPU::VGPRRegBankID || BankRHS == AMDGPU::VGPRRegBankID) {
3987 TargetBankID = AMDGPU::VGPRRegBankID;
3988 } else if (BankLHS == AMDGPU::VCCRegBankID || BankRHS == AMDGPU::VCCRegBankID) {
3989 TargetBankID = AMDGPU::VCCRegBankID;
3990 BankLHS = AMDGPU::VCCRegBankID;
3991 BankRHS = AMDGPU::VCCRegBankID;
3992 } else if (BankLHS == AMDGPU::SGPRRegBankID && BankRHS == AMDGPU::SGPRRegBankID) {
3993 TargetBankID = AMDGPU::SGPRRegBankID;
3994 }
3995 }
3996
3997 OpdsMapping[0] = AMDGPU::getValueMapping(TargetBankID, Size);
3998 OpdsMapping[1] = AMDGPU::getValueMapping(BankLHS, Size);
3999 OpdsMapping[2] = AMDGPU::getValueMapping(BankRHS, Size);
4000 break;
4001 }
4002
4003 if (Size == 64) {
4004
4005 if (isSALUMapping(MI)) {
4006 OpdsMapping[0] = getValueMappingSGPR64Only(AMDGPU::SGPRRegBankID, Size);
4007 OpdsMapping[1] = OpdsMapping[2] = OpdsMapping[0];
4008 } else {
4009 if (MI.getOpcode() == AMDGPU::G_MUL && Subtarget.hasVectorMulU64())
4010 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4011 else
4012 OpdsMapping[0] =
4013 getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size);
4014 unsigned Bank1 = getRegBankID(MI.getOperand(1).getReg(), MRI /*, DefaultBankID*/);
4015 OpdsMapping[1] = AMDGPU::getValueMapping(Bank1, Size);
4016
4017 unsigned Bank2 = getRegBankID(MI.getOperand(2).getReg(), MRI /*, DefaultBankID*/);
4018 OpdsMapping[2] = AMDGPU::getValueMapping(Bank2, Size);
4019 }
4020
4021 break;
4022 }
4023
4024 [[fallthrough]];
4025 }
4026 case AMDGPU::G_PTR_ADD:
4027 case AMDGPU::G_PTRMASK:
4028 case AMDGPU::G_ADD:
4029 case AMDGPU::G_SUB:
4030 case AMDGPU::G_SHL:
4031 case AMDGPU::G_LSHR:
4032 case AMDGPU::G_ASHR:
4033 case AMDGPU::G_UADDO:
4034 case AMDGPU::G_USUBO:
4035 case AMDGPU::G_UADDE:
4036 case AMDGPU::G_SADDE:
4037 case AMDGPU::G_USUBE:
4038 case AMDGPU::G_SSUBE:
4039 case AMDGPU::G_ABS:
4040 case AMDGPU::G_SHUFFLE_VECTOR:
4041 case AMDGPU::G_SBFX:
4042 case AMDGPU::G_UBFX:
4043 case AMDGPU::G_AMDGPU_S_MUL_I64_I32:
4044 case AMDGPU::G_AMDGPU_S_MUL_U64_U32:
4045 if (isSALUMapping(MI))
4046 return getDefaultMappingSOP(MI);
4047 return getDefaultMappingVOP(MI);
4048 case AMDGPU::G_SMIN:
4049 case AMDGPU::G_SMAX:
4050 case AMDGPU::G_UMIN:
4051 case AMDGPU::G_UMAX:
4052 if (isSALUMapping(MI)) {
4053 // There are no scalar 64-bit min and max, use vector instruction instead.
4054 if (MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 64 &&
4055 Subtarget.hasIntMinMax64())
4056 return getDefaultMappingVOP(MI);
4057 return getDefaultMappingSOP(MI);
4058 }
4059 return getDefaultMappingVOP(MI);
4060 case AMDGPU::G_FADD:
4061 case AMDGPU::G_FSUB:
4062 case AMDGPU::G_FMUL:
4063 case AMDGPU::G_FMA:
4064 case AMDGPU::G_FFLOOR:
4065 case AMDGPU::G_FCEIL:
4066 case AMDGPU::G_INTRINSIC_ROUNDEVEN:
4067 case AMDGPU::G_FMINNUM:
4068 case AMDGPU::G_FMAXNUM:
4069 case AMDGPU::G_FMINIMUM:
4070 case AMDGPU::G_FMAXIMUM:
4071 case AMDGPU::G_FMINIMUMNUM:
4072 case AMDGPU::G_FMAXIMUMNUM:
4073 case AMDGPU::G_INTRINSIC_TRUNC:
4074 case AMDGPU::G_STRICT_FADD:
4075 case AMDGPU::G_STRICT_FSUB:
4076 case AMDGPU::G_STRICT_FMUL:
4077 case AMDGPU::G_STRICT_FMA: {
4078 LLT Ty = MRI.getType(MI.getOperand(0).getReg());
4079 unsigned Size = Ty.getSizeInBits();
4080 if (Subtarget.hasSALUFloatInsts() && Ty.isScalar() &&
4081 (Size == 32 || Size == 16) && isSALUMapping(MI))
4082 return getDefaultMappingSOP(MI);
4083 return getDefaultMappingVOP(MI);
4084 }
4085 case AMDGPU::G_FPTOSI:
4086 case AMDGPU::G_FPTOUI:
4087 case AMDGPU::G_SITOFP:
4088 case AMDGPU::G_UITOFP: {
4089 unsigned SizeDst = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4090 unsigned SizeSrc = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4091 if (Subtarget.hasSALUFloatInsts() && SizeDst == 32 && SizeSrc == 32 &&
4093 return getDefaultMappingSOP(MI);
4094 return getDefaultMappingVOP(MI);
4095 }
4096 case AMDGPU::G_FPTRUNC:
4097 case AMDGPU::G_FPEXT: {
4098 unsigned SizeDst = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4099 unsigned SizeSrc = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4100 if (Subtarget.hasSALUFloatInsts() && SizeDst != 64 && SizeSrc != 64 &&
4102 return getDefaultMappingSOP(MI);
4103 return getDefaultMappingVOP(MI);
4104 }
4105 case AMDGPU::G_FSQRT:
4106 case AMDGPU::G_FEXP2:
4107 case AMDGPU::G_FLOG2: {
4108 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4109 if (Subtarget.hasPseudoScalarTrans() && (Size == 16 || Size == 32) &&
4111 return getDefaultMappingSOP(MI);
4112 return getDefaultMappingVOP(MI);
4113 }
4114 case AMDGPU::G_SADDSAT: // FIXME: Could lower sat ops for SALU
4115 case AMDGPU::G_SSUBSAT:
4116 case AMDGPU::G_UADDSAT:
4117 case AMDGPU::G_USUBSAT:
4118 case AMDGPU::G_FMAD:
4119 case AMDGPU::G_FLDEXP:
4120 case AMDGPU::G_FMINNUM_IEEE:
4121 case AMDGPU::G_FMAXNUM_IEEE:
4122 case AMDGPU::G_FCANONICALIZE:
4123 case AMDGPU::G_STRICT_FLDEXP:
4124 case AMDGPU::G_BSWAP: // TODO: Somehow expand for scalar?
4125 case AMDGPU::G_FSHR: // TODO: Expand for scalar
4126 case AMDGPU::G_AMDGPU_FMIN_LEGACY:
4127 case AMDGPU::G_AMDGPU_FMAX_LEGACY:
4128 case AMDGPU::G_AMDGPU_RCP_IFLAG:
4129 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE0:
4130 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE1:
4131 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE2:
4132 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE3:
4133 case AMDGPU::G_AMDGPU_CVT_PK_I16_I32:
4134 case AMDGPU::G_AMDGPU_SMED3:
4135 case AMDGPU::G_AMDGPU_FMED3:
4136 return getDefaultMappingVOP(MI);
4137 case AMDGPU::G_UMULH:
4138 case AMDGPU::G_SMULH: {
4139 if (Subtarget.hasScalarMulHiInsts() && isSALUMapping(MI))
4140 return getDefaultMappingSOP(MI);
4141 return getDefaultMappingVOP(MI);
4142 }
4143 case AMDGPU::G_AMDGPU_MAD_U64_U32:
4144 case AMDGPU::G_AMDGPU_MAD_I64_I32: {
4145 // Three possible mappings:
4146 //
4147 // - Default SOP
4148 // - Default VOP
4149 // - Scalar multiply: src0 and src1 are SGPRs, the rest is VOP.
4150 //
4151 // This allows instruction selection to keep the multiplication part of the
4152 // instruction on the SALU.
4153 bool AllSalu = true;
4154 bool MulSalu = true;
4155 for (unsigned i = 0; i < 5; ++i) {
4156 Register Reg = MI.getOperand(i).getReg();
4157 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
4158 if (Bank->getID() != AMDGPU::SGPRRegBankID) {
4159 AllSalu = false;
4160 if (i == 2 || i == 3) {
4161 MulSalu = false;
4162 break;
4163 }
4164 }
4165 }
4166 }
4167
4168 if (AllSalu)
4169 return getDefaultMappingSOP(MI);
4170
4171 // If the multiply-add is full-rate in VALU, use that even if the
4172 // multiplication part is scalar. Accumulating separately on the VALU would
4173 // take two instructions.
4174 if (!MulSalu || Subtarget.hasFullRate64Ops())
4175 return getDefaultMappingVOP(MI);
4176
4177 // Keep the multiplication on the SALU, then accumulate on the VALU.
4178 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 64);
4179 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4180 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4181 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4182 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 64);
4183 break;
4184 }
4185 case AMDGPU::G_IMPLICIT_DEF: {
4186 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4187 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4188 break;
4189 }
4190 case AMDGPU::G_FCONSTANT:
4191 case AMDGPU::G_CONSTANT:
4192 case AMDGPU::G_GLOBAL_VALUE:
4193 case AMDGPU::G_FRAME_INDEX:
4194 case AMDGPU::G_BLOCK_ADDR:
4195 case AMDGPU::G_READSTEADYCOUNTER:
4196 case AMDGPU::G_READCYCLECOUNTER: {
4197 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4198 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4199 break;
4200 }
4201 case AMDGPU::G_DYN_STACKALLOC: {
4202 // Result is always uniform, and a wave reduction is needed for the source.
4203 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4204 unsigned SrcBankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4205 OpdsMapping[1] = AMDGPU::getValueMapping(SrcBankID, 32);
4206 break;
4207 }
4208 case AMDGPU::G_AMDGPU_WAVE_ADDRESS: {
4209 // This case is weird because we expect a physical register in the source,
4210 // but need to set a bank anyway.
4211 //
4212 // TODO: We could select the result to SGPR or VGPR
4213 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4214 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4215 break;
4216 }
4217 case AMDGPU::G_INSERT: {
4218 unsigned BankID = getMappingType(MRI, MI);
4219 unsigned DstSize = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4220 unsigned SrcSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
4221 unsigned EltSize = getSizeInBits(MI.getOperand(2).getReg(), MRI, *TRI);
4222 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, DstSize);
4223 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, SrcSize);
4224 OpdsMapping[2] = AMDGPU::getValueMapping(BankID, EltSize);
4225 OpdsMapping[3] = nullptr;
4226 break;
4227 }
4228 case AMDGPU::G_EXTRACT: {
4229 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4230 unsigned DstSize = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4231 unsigned SrcSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
4232 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, DstSize);
4233 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, SrcSize);
4234 OpdsMapping[2] = nullptr;
4235 break;
4236 }
4237 case AMDGPU::G_BUILD_VECTOR:
4238 case AMDGPU::G_BUILD_VECTOR_TRUNC: {
4239 LLT DstTy = MRI.getType(MI.getOperand(0).getReg());
4240 if (DstTy == LLT::fixed_vector(2, 16)) {
4241 unsigned DstSize = DstTy.getSizeInBits();
4242 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4243 unsigned Src0BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4244 unsigned Src1BankID = getRegBankID(MI.getOperand(2).getReg(), MRI);
4245 unsigned DstBankID = regBankUnion(Src0BankID, Src1BankID);
4246
4247 OpdsMapping[0] = AMDGPU::getValueMapping(DstBankID, DstSize);
4248 OpdsMapping[1] = AMDGPU::getValueMapping(Src0BankID, SrcSize);
4249 OpdsMapping[2] = AMDGPU::getValueMapping(Src1BankID, SrcSize);
4250 break;
4251 }
4252
4253 [[fallthrough]];
4254 }
4255 case AMDGPU::G_MERGE_VALUES:
4256 case AMDGPU::G_CONCAT_VECTORS: {
4257 unsigned Bank = getMappingType(MRI, MI);
4258 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4259 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4260
4261 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, DstSize);
4262 // Op1 and Dst should use the same register bank.
4263 for (unsigned i = 1, e = MI.getNumOperands(); i != e; ++i)
4264 OpdsMapping[i] = AMDGPU::getValueMapping(Bank, SrcSize);
4265 break;
4266 }
4267 case AMDGPU::G_BITREVERSE:
4268 case AMDGPU::G_BITCAST:
4269 case AMDGPU::G_INTTOPTR:
4270 case AMDGPU::G_PTRTOINT:
4271 case AMDGPU::G_FABS:
4272 case AMDGPU::G_FNEG: {
4273 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4274 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4275 OpdsMapping[0] = OpdsMapping[1] = AMDGPU::getValueMapping(BankID, Size);
4276 break;
4277 }
4278 case AMDGPU::G_AMDGPU_FFBH_U32:
4279 case AMDGPU::G_AMDGPU_FFBL_B32:
4280 case AMDGPU::G_CTLZ_ZERO_UNDEF:
4281 case AMDGPU::G_CTTZ_ZERO_UNDEF: {
4282 unsigned Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4283 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4284 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, 32);
4285 OpdsMapping[1] = AMDGPU::getValueMappingSGPR64Only(BankID, Size);
4286 break;
4287 }
4288 case AMDGPU::G_CTPOP: {
4289 unsigned Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4290 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4291 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, 32);
4292
4293 // This should really be getValueMappingSGPR64Only, but allowing the generic
4294 // code to handle the register split just makes using LegalizerHelper more
4295 // difficult.
4296 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, Size);
4297 break;
4298 }
4299 case AMDGPU::G_TRUNC: {
4300 Register Dst = MI.getOperand(0).getReg();
4301 Register Src = MI.getOperand(1).getReg();
4302 unsigned Bank = getRegBankID(Src, MRI);
4303 unsigned DstSize = getSizeInBits(Dst, MRI, *TRI);
4304 unsigned SrcSize = getSizeInBits(Src, MRI, *TRI);
4305 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, DstSize);
4306 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, SrcSize);
4307 break;
4308 }
4309 case AMDGPU::G_ZEXT:
4310 case AMDGPU::G_SEXT:
4311 case AMDGPU::G_ANYEXT:
4312 case AMDGPU::G_SEXT_INREG: {
4313 Register Dst = MI.getOperand(0).getReg();
4314 Register Src = MI.getOperand(1).getReg();
4315 unsigned DstSize = getSizeInBits(Dst, MRI, *TRI);
4316 unsigned SrcSize = getSizeInBits(Src, MRI, *TRI);
4317
4318 unsigned DstBank;
4319 const RegisterBank *SrcBank = getRegBank(Src, MRI, *TRI);
4320 assert(SrcBank);
4321 switch (SrcBank->getID()) {
4322 case AMDGPU::SGPRRegBankID:
4323 DstBank = AMDGPU::SGPRRegBankID;
4324 break;
4325 default:
4326 DstBank = AMDGPU::VGPRRegBankID;
4327 break;
4328 }
4329
4330 // Scalar extend can use 64-bit BFE, but VGPRs require extending to
4331 // 32-bits, and then to 64.
4332 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(DstBank, DstSize);
4333 OpdsMapping[1] = AMDGPU::getValueMappingSGPR64Only(SrcBank->getID(),
4334 SrcSize);
4335 break;
4336 }
4337 case AMDGPU::G_IS_FPCLASS: {
4338 Register SrcReg = MI.getOperand(1).getReg();
4339 unsigned SrcSize = MRI.getType(SrcReg).getSizeInBits();
4340 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4341 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, DstSize);
4342 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4343 break;
4344 }
4345 case AMDGPU::G_STORE: {
4346 assert(MI.getOperand(0).isReg());
4347 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4348
4349 // FIXME: We need to specify a different reg bank once scalar stores are
4350 // supported.
4351 const ValueMapping *ValMapping =
4352 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4353 OpdsMapping[0] = ValMapping;
4354 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
4355 break;
4356 }
4357 case AMDGPU::G_ICMP:
4358 case AMDGPU::G_FCMP: {
4359 unsigned Size = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4360
4361 // See if the result register has already been constrained to vcc, which may
4362 // happen due to control flow intrinsic lowering.
4363 unsigned DstBank = getRegBankID(MI.getOperand(0).getReg(), MRI,
4364 AMDGPU::SGPRRegBankID);
4365 unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI);
4366 unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI);
4367
4368 auto canUseSCCICMP = [&]() {
4369 auto Pred =
4370 static_cast<CmpInst::Predicate>(MI.getOperand(1).getPredicate());
4371 return Size == 32 ||
4372 (Size == 64 &&
4373 (Pred == CmpInst::ICMP_EQ || Pred == CmpInst::ICMP_NE) &&
4374 Subtarget.hasScalarCompareEq64());
4375 };
4376 auto canUseSCCFCMP = [&]() {
4377 return Subtarget.hasSALUFloatInsts() && (Size == 32 || Size == 16);
4378 };
4379
4380 bool isICMP = MI.getOpcode() == AMDGPU::G_ICMP;
4381 bool CanUseSCC = DstBank == AMDGPU::SGPRRegBankID &&
4382 Op2Bank == AMDGPU::SGPRRegBankID &&
4383 Op3Bank == AMDGPU::SGPRRegBankID &&
4384 (isICMP ? canUseSCCICMP() : canUseSCCFCMP());
4385
4386 DstBank = CanUseSCC ? AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
4387 unsigned SrcBank = CanUseSCC ? AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
4388
4389 // TODO: Use 32-bit for scalar output size.
4390 // SCC results will need to be copied to a 32-bit SGPR virtual register.
4391 const unsigned ResultSize = 1;
4392
4393 OpdsMapping[0] = AMDGPU::getValueMapping(DstBank, ResultSize);
4394 OpdsMapping[1] = nullptr; // Predicate Operand.
4395 OpdsMapping[2] = AMDGPU::getValueMapping(SrcBank, Size);
4396 OpdsMapping[3] = AMDGPU::getValueMapping(SrcBank, Size);
4397 break;
4398 }
4399 case AMDGPU::G_EXTRACT_VECTOR_ELT: {
4400 // VGPR index can be used for waterfall when indexing a SGPR vector.
4401 unsigned SrcBankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4402 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4403 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4404 unsigned IdxSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4405 unsigned IdxBank = getRegBankID(MI.getOperand(2).getReg(), MRI);
4406 unsigned OutputBankID = regBankUnion(SrcBankID, IdxBank);
4407
4408 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(OutputBankID, DstSize);
4409 OpdsMapping[1] = AMDGPU::getValueMapping(SrcBankID, SrcSize);
4410
4411 // The index can be either if the source vector is VGPR.
4412 OpdsMapping[2] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4413 break;
4414 }
4415 case AMDGPU::G_INSERT_VECTOR_ELT: {
4416 unsigned OutputBankID = isSALUMapping(MI) ?
4417 AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
4418
4419 unsigned VecSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4420 unsigned InsertSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4421 unsigned IdxSize = MRI.getType(MI.getOperand(3).getReg()).getSizeInBits();
4422 unsigned InsertEltBankID = getRegBankID(MI.getOperand(2).getReg(), MRI);
4423 unsigned IdxBankID = getRegBankID(MI.getOperand(3).getReg(), MRI);
4424
4425 OpdsMapping[0] = AMDGPU::getValueMapping(OutputBankID, VecSize);
4426 OpdsMapping[1] = AMDGPU::getValueMapping(OutputBankID, VecSize);
4427
4428 // This is a weird case, because we need to break down the mapping based on
4429 // the register bank of a different operand.
4430 if (InsertSize == 64 && OutputBankID == AMDGPU::VGPRRegBankID) {
4431 OpdsMapping[2] = AMDGPU::getValueMappingSplit64(InsertEltBankID,
4432 InsertSize);
4433 } else {
4434 assert(InsertSize == 32 || InsertSize == 64);
4435 OpdsMapping[2] = AMDGPU::getValueMapping(InsertEltBankID, InsertSize);
4436 }
4437
4438 // The index can be either if the source vector is VGPR.
4439 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBankID, IdxSize);
4440 break;
4441 }
4442 case AMDGPU::G_UNMERGE_VALUES: {
4443 unsigned Bank = getMappingType(MRI, MI);
4444
4445 // Op1 and Dst should use the same register bank.
4446 // FIXME: Shouldn't this be the default? Why do we need to handle this?
4447 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
4448 unsigned Size = getSizeInBits(MI.getOperand(i).getReg(), MRI, *TRI);
4449 OpdsMapping[i] = AMDGPU::getValueMapping(Bank, Size);
4450 }
4451 break;
4452 }
4453 case AMDGPU::G_AMDGPU_BUFFER_LOAD:
4454 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE:
4455 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE:
4456 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT:
4457 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT:
4458 case AMDGPU::G_AMDGPU_BUFFER_LOAD_TFE:
4459 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE_TFE:
4460 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE_TFE:
4461 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT_TFE:
4462 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT_TFE:
4463 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT:
4464 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_TFE:
4465 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_D16:
4466 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT:
4467 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT_D16:
4468 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:
4469 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16:
4470 case AMDGPU::G_AMDGPU_BUFFER_STORE:
4471 case AMDGPU::G_AMDGPU_BUFFER_STORE_BYTE:
4472 case AMDGPU::G_AMDGPU_BUFFER_STORE_SHORT:
4473 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:
4474 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16: {
4475 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4476
4477 // rsrc
4478 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4479
4480 // vindex
4481 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4482
4483 // voffset
4484 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4485
4486 // soffset
4487 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4488
4489 // Any remaining operands are immediates and were correctly null
4490 // initialized.
4491 break;
4492 }
4493 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
4494 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
4495 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
4496 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
4497 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
4498 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
4499 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
4500 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
4501 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_OR:
4502 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_XOR:
4503 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_INC:
4504 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_DEC:
4505 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FADD:
4506 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMIN:
4507 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMAX: {
4508 // vdata_out
4509 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4510
4511 // vdata_in
4512 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4513
4514 // rsrc
4515 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4516
4517 // vindex
4518 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4519
4520 // voffset
4521 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4522
4523 // soffset
4524 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4525
4526 // Any remaining operands are immediates and were correctly null
4527 // initialized.
4528 break;
4529 }
4530 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_CMPSWAP: {
4531 // vdata_out
4532 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4533
4534 // vdata_in
4535 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4536
4537 // cmp
4538 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4539
4540 // rsrc
4541 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4542
4543 // vindex
4544 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4545
4546 // voffset
4547 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4548
4549 // soffset
4550 OpdsMapping[6] = getSGPROpMapping(MI.getOperand(6).getReg(), MRI, *TRI);
4551
4552 // Any remaining operands are immediates and were correctly null
4553 // initialized.
4554 break;
4555 }
4556 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
4557 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
4558 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
4559 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
4560 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT: {
4561 // Lie and claim everything is legal, even though some need to be
4562 // SGPRs. applyMapping will have to deal with it as a waterfall loop.
4563 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4564 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4565
4566 // We need to convert this to a MUBUF if either the resource of offset is
4567 // VGPR.
4568 unsigned RSrcBank = OpdsMapping[1]->BreakDown[0].RegBank->getID();
4569 unsigned OffsetBank = OpdsMapping[2]->BreakDown[0].RegBank->getID();
4570 unsigned ResultBank = regBankUnion(RSrcBank, OffsetBank);
4571
4572 unsigned Size0 = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4573 OpdsMapping[0] = AMDGPU::getValueMapping(ResultBank, Size0);
4574 break;
4575 }
4576 case AMDGPU::G_AMDGPU_S_BUFFER_PREFETCH:
4577 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4578 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4579 break;
4580 case AMDGPU::G_INTRINSIC:
4581 case AMDGPU::G_INTRINSIC_CONVERGENT: {
4582 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
4583 default:
4585 case Intrinsic::amdgcn_div_fmas:
4586 case Intrinsic::amdgcn_div_fixup:
4587 case Intrinsic::amdgcn_trig_preop:
4588 case Intrinsic::amdgcn_sin:
4589 case Intrinsic::amdgcn_cos:
4590 case Intrinsic::amdgcn_log_clamp:
4591 case Intrinsic::amdgcn_rcp_legacy:
4592 case Intrinsic::amdgcn_rsq_legacy:
4593 case Intrinsic::amdgcn_rsq_clamp:
4594 case Intrinsic::amdgcn_tanh:
4595 case Intrinsic::amdgcn_fmul_legacy:
4596 case Intrinsic::amdgcn_fma_legacy:
4597 case Intrinsic::amdgcn_frexp_mant:
4598 case Intrinsic::amdgcn_frexp_exp:
4599 case Intrinsic::amdgcn_fract:
4600 case Intrinsic::amdgcn_cvt_pknorm_i16:
4601 case Intrinsic::amdgcn_cvt_pknorm_u16:
4602 case Intrinsic::amdgcn_cvt_pk_i16:
4603 case Intrinsic::amdgcn_cvt_pk_u16:
4604 case Intrinsic::amdgcn_cvt_sr_pk_f16_f32:
4605 case Intrinsic::amdgcn_cvt_sr_pk_bf16_f32:
4606 case Intrinsic::amdgcn_cvt_pk_f16_fp8:
4607 case Intrinsic::amdgcn_cvt_pk_f16_bf8:
4608 case Intrinsic::amdgcn_cvt_pk_fp8_f16:
4609 case Intrinsic::amdgcn_cvt_pk_bf8_f16:
4610 case Intrinsic::amdgcn_cvt_sr_fp8_f16:
4611 case Intrinsic::amdgcn_cvt_sr_bf8_f16:
4612 case Intrinsic::amdgcn_cvt_scale_pk8_f16_fp8:
4613 case Intrinsic::amdgcn_cvt_scale_pk8_bf16_fp8:
4614 case Intrinsic::amdgcn_cvt_scale_pk8_f16_bf8:
4615 case Intrinsic::amdgcn_cvt_scale_pk8_bf16_bf8:
4616 case Intrinsic::amdgcn_cvt_scale_pk8_f16_fp4:
4617 case Intrinsic::amdgcn_cvt_scale_pk8_bf16_fp4:
4618 case Intrinsic::amdgcn_cvt_scale_pk8_f32_fp8:
4619 case Intrinsic::amdgcn_cvt_scale_pk8_f32_bf8:
4620 case Intrinsic::amdgcn_cvt_scale_pk8_f32_fp4:
4621 case Intrinsic::amdgcn_cvt_scale_pk16_f16_fp6:
4622 case Intrinsic::amdgcn_cvt_scale_pk16_bf16_fp6:
4623 case Intrinsic::amdgcn_cvt_scale_pk16_f16_bf6:
4624 case Intrinsic::amdgcn_cvt_scale_pk16_bf16_bf6:
4625 case Intrinsic::amdgcn_cvt_scale_pk16_f32_fp6:
4626 case Intrinsic::amdgcn_cvt_scale_pk16_f32_bf6:
4627 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp8_bf16:
4628 case Intrinsic::amdgcn_cvt_scalef32_pk8_bf8_bf16:
4629 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp8_f16:
4630 case Intrinsic::amdgcn_cvt_scalef32_pk8_bf8_f16:
4631 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp8_f32:
4632 case Intrinsic::amdgcn_cvt_scalef32_pk8_bf8_f32:
4633 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp4_f32:
4634 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp4_f16:
4635 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp4_bf16:
4636 case Intrinsic::amdgcn_cvt_scalef32_pk16_fp6_f32:
4637 case Intrinsic::amdgcn_cvt_scalef32_pk16_bf6_f32:
4638 case Intrinsic::amdgcn_cvt_scalef32_pk16_fp6_f16:
4639 case Intrinsic::amdgcn_cvt_scalef32_pk16_bf6_f16:
4640 case Intrinsic::amdgcn_cvt_scalef32_pk16_fp6_bf16:
4641 case Intrinsic::amdgcn_cvt_scalef32_pk16_bf6_bf16:
4642 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp8_bf16:
4643 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_bf8_bf16:
4644 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp8_f16:
4645 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_bf8_f16:
4646 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp8_f32:
4647 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_bf8_f32:
4648 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp4_f32:
4649 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp4_f16:
4650 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp4_bf16:
4651 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_fp6_f32:
4652 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_bf6_f32:
4653 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_fp6_f16:
4654 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_bf6_f16:
4655 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_fp6_bf16:
4656 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_bf6_bf16:
4657 case Intrinsic::amdgcn_sat_pk4_i4_i8:
4658 case Intrinsic::amdgcn_sat_pk4_u4_u8:
4659 case Intrinsic::amdgcn_fmed3:
4660 case Intrinsic::amdgcn_cubeid:
4661 case Intrinsic::amdgcn_cubema:
4662 case Intrinsic::amdgcn_cubesc:
4663 case Intrinsic::amdgcn_cubetc:
4664 case Intrinsic::amdgcn_sffbh:
4665 case Intrinsic::amdgcn_fmad_ftz:
4666 case Intrinsic::amdgcn_mbcnt_lo:
4667 case Intrinsic::amdgcn_mbcnt_hi:
4668 case Intrinsic::amdgcn_mul_u24:
4669 case Intrinsic::amdgcn_mul_i24:
4670 case Intrinsic::amdgcn_mulhi_u24:
4671 case Intrinsic::amdgcn_mulhi_i24:
4672 case Intrinsic::amdgcn_lerp:
4673 case Intrinsic::amdgcn_sad_u8:
4674 case Intrinsic::amdgcn_msad_u8:
4675 case Intrinsic::amdgcn_sad_hi_u8:
4676 case Intrinsic::amdgcn_sad_u16:
4677 case Intrinsic::amdgcn_qsad_pk_u16_u8:
4678 case Intrinsic::amdgcn_mqsad_pk_u16_u8:
4679 case Intrinsic::amdgcn_mqsad_u32_u8:
4680 case Intrinsic::amdgcn_cvt_pk_u8_f32:
4681 case Intrinsic::amdgcn_alignbyte:
4682 case Intrinsic::amdgcn_perm:
4683 case Intrinsic::amdgcn_prng_b32:
4684 case Intrinsic::amdgcn_fdot2:
4685 case Intrinsic::amdgcn_sdot2:
4686 case Intrinsic::amdgcn_udot2:
4687 case Intrinsic::amdgcn_sdot4:
4688 case Intrinsic::amdgcn_udot4:
4689 case Intrinsic::amdgcn_sdot8:
4690 case Intrinsic::amdgcn_udot8:
4691 case Intrinsic::amdgcn_fdot2_bf16_bf16:
4692 case Intrinsic::amdgcn_fdot2_f16_f16:
4693 case Intrinsic::amdgcn_fdot2_f32_bf16:
4694 case Intrinsic::amdgcn_fdot2c_f32_bf16:
4695 case Intrinsic::amdgcn_sudot4:
4696 case Intrinsic::amdgcn_sudot8:
4697 case Intrinsic::amdgcn_dot4_f32_fp8_bf8:
4698 case Intrinsic::amdgcn_dot4_f32_bf8_fp8:
4699 case Intrinsic::amdgcn_dot4_f32_fp8_fp8:
4700 case Intrinsic::amdgcn_dot4_f32_bf8_bf8:
4701 case Intrinsic::amdgcn_cvt_f32_fp8:
4702 case Intrinsic::amdgcn_cvt_f32_fp8_e5m3:
4703 case Intrinsic::amdgcn_cvt_f32_bf8:
4704 case Intrinsic::amdgcn_cvt_off_f32_i4:
4705 case Intrinsic::amdgcn_cvt_pk_f32_fp8:
4706 case Intrinsic::amdgcn_cvt_pk_f32_bf8:
4707 case Intrinsic::amdgcn_cvt_pk_fp8_f32:
4708 case Intrinsic::amdgcn_cvt_pk_fp8_f32_e5m3:
4709 case Intrinsic::amdgcn_cvt_pk_bf8_f32:
4710 case Intrinsic::amdgcn_cvt_sr_fp8_f32:
4711 case Intrinsic::amdgcn_cvt_sr_fp8_f32_e5m3:
4712 case Intrinsic::amdgcn_cvt_sr_bf8_f32:
4713 case Intrinsic::amdgcn_cvt_sr_bf16_f32:
4714 case Intrinsic::amdgcn_cvt_sr_f16_f32:
4715 case Intrinsic::amdgcn_cvt_f16_fp8:
4716 case Intrinsic::amdgcn_cvt_f16_bf8:
4717 case Intrinsic::amdgcn_cvt_scalef32_pk32_fp6_f16:
4718 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf6_f16:
4719 case Intrinsic::amdgcn_cvt_scalef32_pk32_fp6_bf16:
4720 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf6_bf16:
4721 case Intrinsic::amdgcn_cvt_scalef32_f16_fp8:
4722 case Intrinsic::amdgcn_cvt_scalef32_f16_bf8:
4723 case Intrinsic::amdgcn_cvt_scalef32_f32_fp8:
4724 case Intrinsic::amdgcn_cvt_scalef32_f32_bf8:
4725 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_f32:
4726 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_f32:
4727 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_fp8:
4728 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_bf8:
4729 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_f16:
4730 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_bf16:
4731 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_f16:
4732 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_bf16:
4733 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_fp4:
4734 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_f32:
4735 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_fp4:
4736 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_fp4:
4737 case Intrinsic::amdgcn_cvt_scalef32_pk32_f32_fp6:
4738 case Intrinsic::amdgcn_cvt_scalef32_pk32_f32_bf6:
4739 case Intrinsic::amdgcn_cvt_scalef32_pk32_f16_bf6:
4740 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf16_bf6:
4741 case Intrinsic::amdgcn_cvt_scalef32_pk32_f16_fp6:
4742 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf16_fp6:
4743 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_bf8:
4744 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_bf8:
4745 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_fp8:
4746 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_fp8:
4747 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_f16:
4748 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_bf16:
4749 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_f16:
4750 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_bf16:
4751 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_f32:
4752 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_bf16:
4753 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_f16:
4754 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_f32:
4755 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_bf16:
4756 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_f16:
4757 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_f32:
4758 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_bf16:
4759 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_f16:
4760 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_f32:
4761 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_bf16:
4762 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_f16:
4763 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_f32:
4764 case Intrinsic::amdgcn_ashr_pk_i8_i32:
4765 case Intrinsic::amdgcn_ashr_pk_u8_i32:
4766 case Intrinsic::amdgcn_cvt_scalef32_2xpk16_fp6_f32:
4767 case Intrinsic::amdgcn_cvt_scalef32_2xpk16_bf6_f32:
4768 case Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16:
4769 case Intrinsic::amdgcn_wmma_f16_16x16x16_f16:
4770 case Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16_tied:
4771 case Intrinsic::amdgcn_wmma_f16_16x16x16_f16_tied:
4772 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf16:
4773 case Intrinsic::amdgcn_wmma_f32_16x16x16_f16:
4774 case Intrinsic::amdgcn_wmma_i32_16x16x16_iu4:
4775 case Intrinsic::amdgcn_wmma_i32_16x16x16_iu8:
4776 case Intrinsic::amdgcn_wmma_f32_16x16x16_fp8_fp8:
4777 case Intrinsic::amdgcn_wmma_f32_16x16x16_fp8_bf8:
4778 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf8_fp8:
4779 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf8_bf8:
4780 case Intrinsic::amdgcn_wmma_i32_16x16x32_iu4:
4781 case Intrinsic::amdgcn_swmmac_f32_16x16x32_f16:
4782 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf16:
4783 case Intrinsic::amdgcn_swmmac_f16_16x16x32_f16:
4784 case Intrinsic::amdgcn_swmmac_bf16_16x16x32_bf16:
4785 case Intrinsic::amdgcn_swmmac_i32_16x16x32_iu8:
4786 case Intrinsic::amdgcn_swmmac_i32_16x16x32_iu4:
4787 case Intrinsic::amdgcn_swmmac_i32_16x16x64_iu4:
4788 case Intrinsic::amdgcn_swmmac_f32_16x16x32_fp8_fp8:
4789 case Intrinsic::amdgcn_swmmac_f32_16x16x32_fp8_bf8:
4790 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf8_fp8:
4791 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf8_bf8:
4792 case Intrinsic::amdgcn_wmma_f32_16x16x4_f32:
4793 case Intrinsic::amdgcn_wmma_f32_16x16x32_bf16:
4794 case Intrinsic::amdgcn_wmma_f32_16x16x32_f16:
4795 case Intrinsic::amdgcn_wmma_f16_16x16x32_f16:
4796 case Intrinsic::amdgcn_wmma_bf16_16x16x32_bf16:
4797 case Intrinsic::amdgcn_wmma_bf16f32_16x16x32_bf16:
4798 case Intrinsic::amdgcn_wmma_f32_16x16x64_fp8_fp8:
4799 case Intrinsic::amdgcn_wmma_f32_16x16x64_fp8_bf8:
4800 case Intrinsic::amdgcn_wmma_f32_16x16x64_bf8_fp8:
4801 case Intrinsic::amdgcn_wmma_f32_16x16x64_bf8_bf8:
4802 case Intrinsic::amdgcn_wmma_f16_16x16x64_fp8_fp8:
4803 case Intrinsic::amdgcn_wmma_f16_16x16x64_fp8_bf8:
4804 case Intrinsic::amdgcn_wmma_f16_16x16x64_bf8_fp8:
4805 case Intrinsic::amdgcn_wmma_f16_16x16x64_bf8_bf8:
4806 case Intrinsic::amdgcn_wmma_f16_16x16x128_fp8_fp8:
4807 case Intrinsic::amdgcn_wmma_f16_16x16x128_fp8_bf8:
4808 case Intrinsic::amdgcn_wmma_f16_16x16x128_bf8_fp8:
4809 case Intrinsic::amdgcn_wmma_f16_16x16x128_bf8_bf8:
4810 case Intrinsic::amdgcn_wmma_f32_16x16x128_fp8_fp8:
4811 case Intrinsic::amdgcn_wmma_f32_16x16x128_fp8_bf8:
4812 case Intrinsic::amdgcn_wmma_f32_16x16x128_bf8_fp8:
4813 case Intrinsic::amdgcn_wmma_f32_16x16x128_bf8_bf8:
4814 case Intrinsic::amdgcn_wmma_i32_16x16x64_iu8:
4815 case Intrinsic::amdgcn_wmma_f32_16x16x128_f8f6f4:
4816 case Intrinsic::amdgcn_wmma_scale_f32_16x16x128_f8f6f4:
4817 case Intrinsic::amdgcn_wmma_scale16_f32_16x16x128_f8f6f4:
4818 case Intrinsic::amdgcn_wmma_f32_32x16x128_f4:
4819 case Intrinsic::amdgcn_wmma_scale_f32_32x16x128_f4:
4820 case Intrinsic::amdgcn_wmma_scale16_f32_32x16x128_f4:
4821 case Intrinsic::amdgcn_swmmac_f16_16x16x64_f16:
4822 case Intrinsic::amdgcn_swmmac_bf16_16x16x64_bf16:
4823 case Intrinsic::amdgcn_swmmac_f32_16x16x64_bf16:
4824 case Intrinsic::amdgcn_swmmac_bf16f32_16x16x64_bf16:
4825 case Intrinsic::amdgcn_swmmac_f32_16x16x64_f16:
4826 case Intrinsic::amdgcn_swmmac_f32_16x16x128_fp8_fp8:
4827 case Intrinsic::amdgcn_swmmac_f32_16x16x128_fp8_bf8:
4828 case Intrinsic::amdgcn_swmmac_f32_16x16x128_bf8_fp8:
4829 case Intrinsic::amdgcn_swmmac_f32_16x16x128_bf8_bf8:
4830 case Intrinsic::amdgcn_swmmac_f16_16x16x128_fp8_fp8:
4831 case Intrinsic::amdgcn_swmmac_f16_16x16x128_fp8_bf8:
4832 case Intrinsic::amdgcn_swmmac_f16_16x16x128_bf8_fp8:
4833 case Intrinsic::amdgcn_swmmac_f16_16x16x128_bf8_bf8:
4834 case Intrinsic::amdgcn_swmmac_i32_16x16x128_iu8:
4835 case Intrinsic::amdgcn_perm_pk16_b4_u4:
4836 case Intrinsic::amdgcn_perm_pk16_b6_u4:
4837 case Intrinsic::amdgcn_perm_pk16_b8_u4:
4838 return getDefaultMappingVOP(MI);
4839 case Intrinsic::amdgcn_log:
4840 case Intrinsic::amdgcn_exp2:
4841 case Intrinsic::amdgcn_rcp:
4842 case Intrinsic::amdgcn_rsq:
4843 case Intrinsic::amdgcn_sqrt: {
4844 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4845 if (Subtarget.hasPseudoScalarTrans() && (Size == 16 || Size == 32) &&
4847 return getDefaultMappingSOP(MI);
4848 return getDefaultMappingVOP(MI);
4849 }
4850 case Intrinsic::amdgcn_sbfe:
4851 case Intrinsic::amdgcn_ubfe:
4852 if (isSALUMapping(MI))
4853 return getDefaultMappingSOP(MI);
4854 return getDefaultMappingVOP(MI);
4855 case Intrinsic::amdgcn_ds_swizzle:
4856 case Intrinsic::amdgcn_ds_permute:
4857 case Intrinsic::amdgcn_ds_bpermute:
4858 case Intrinsic::amdgcn_update_dpp:
4859 case Intrinsic::amdgcn_mov_dpp8:
4860 case Intrinsic::amdgcn_mov_dpp:
4861 case Intrinsic::amdgcn_strict_wwm:
4862 case Intrinsic::amdgcn_wwm:
4863 case Intrinsic::amdgcn_strict_wqm:
4864 case Intrinsic::amdgcn_wqm:
4865 case Intrinsic::amdgcn_softwqm:
4866 case Intrinsic::amdgcn_set_inactive:
4867 case Intrinsic::amdgcn_set_inactive_chain_arg:
4868 case Intrinsic::amdgcn_permlane64:
4869 case Intrinsic::amdgcn_ds_bpermute_fi_b32:
4871 case Intrinsic::amdgcn_cvt_pkrtz:
4872 if (Subtarget.hasSALUFloatInsts() && isSALUMapping(MI))
4873 return getDefaultMappingSOP(MI);
4874 return getDefaultMappingVOP(MI);
4875 case Intrinsic::amdgcn_kernarg_segment_ptr:
4876 case Intrinsic::amdgcn_s_getpc:
4877 case Intrinsic::amdgcn_groupstaticsize:
4878 case Intrinsic::amdgcn_reloc_constant:
4879 case Intrinsic::returnaddress: {
4880 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4881 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4882 break;
4883 }
4884 case Intrinsic::amdgcn_wqm_vote: {
4885 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4886 OpdsMapping[0] = OpdsMapping[2]
4887 = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size);
4888 break;
4889 }
4890 case Intrinsic::amdgcn_ps_live: {
4891 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4892 break;
4893 }
4894 case Intrinsic::amdgcn_div_scale: {
4895 unsigned Dst0Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4896 unsigned Dst1Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4897 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Dst0Size);
4898 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Dst1Size);
4899
4900 unsigned SrcSize = MRI.getType(MI.getOperand(3).getReg()).getSizeInBits();
4901 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4902 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4903 break;
4904 }
4905 case Intrinsic::amdgcn_class: {
4906 Register Src0Reg = MI.getOperand(2).getReg();
4907 Register Src1Reg = MI.getOperand(3).getReg();
4908 unsigned Src0Size = MRI.getType(Src0Reg).getSizeInBits();
4909 unsigned Src1Size = MRI.getType(Src1Reg).getSizeInBits();
4910 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4911 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, DstSize);
4912 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src0Size);
4913 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src1Size);
4914 break;
4915 }
4916 case Intrinsic::amdgcn_icmp:
4917 case Intrinsic::amdgcn_fcmp: {
4918 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4919 // This is not VCCRegBank because this is not used in boolean contexts.
4920 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4921 unsigned OpSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4922 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
4923 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
4924 break;
4925 }
4926 case Intrinsic::amdgcn_readlane: {
4927 // This must be an SGPR, but accept a VGPR.
4928 Register IdxReg = MI.getOperand(3).getReg();
4929 unsigned IdxSize = MRI.getType(IdxReg).getSizeInBits();
4930 unsigned IdxBank = getRegBankID(IdxReg, MRI, AMDGPU::SGPRRegBankID);
4931 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4932 [[fallthrough]];
4933 }
4934 case Intrinsic::amdgcn_readfirstlane: {
4935 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4936 unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4937 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4938 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4939 break;
4940 }
4941 case Intrinsic::amdgcn_writelane: {
4942 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4943 Register SrcReg = MI.getOperand(2).getReg();
4944 unsigned SrcSize = MRI.getType(SrcReg).getSizeInBits();
4945 unsigned SrcBank = getRegBankID(SrcReg, MRI, AMDGPU::SGPRRegBankID);
4946 Register IdxReg = MI.getOperand(3).getReg();
4947 unsigned IdxSize = MRI.getType(IdxReg).getSizeInBits();
4948 unsigned IdxBank = getRegBankID(IdxReg, MRI, AMDGPU::SGPRRegBankID);
4949 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
4950
4951 // These 2 must be SGPRs, but accept VGPRs. Readfirstlane will be inserted
4952 // to legalize.
4953 OpdsMapping[2] = AMDGPU::getValueMapping(SrcBank, SrcSize);
4954 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4955 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4956 break;
4957 }
4958 case Intrinsic::amdgcn_if_break: {
4959 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4960 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4961 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4962 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4963 break;
4964 }
4965 case Intrinsic::amdgcn_permlane16:
4966 case Intrinsic::amdgcn_permlanex16: {
4967 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4968 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4969 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4970 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4971 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4972 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4973 break;
4974 }
4975 case Intrinsic::amdgcn_permlane_bcast:
4976 case Intrinsic::amdgcn_permlane_up:
4977 case Intrinsic::amdgcn_permlane_down:
4978 case Intrinsic::amdgcn_permlane_xor: {
4979 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4980 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4981 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4982 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4983 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4984 break;
4985 }
4986 case Intrinsic::amdgcn_permlane_idx_gen: {
4987 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4988 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4989 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4990 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4991 break;
4992 }
4993 case Intrinsic::amdgcn_permlane16_var:
4994 case Intrinsic::amdgcn_permlanex16_var: {
4995 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4996 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4997 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4998 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4999 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5000 break;
5001 }
5002 case Intrinsic::amdgcn_mfma_f32_4x4x1f32:
5003 case Intrinsic::amdgcn_mfma_f32_4x4x4f16:
5004 case Intrinsic::amdgcn_mfma_i32_4x4x4i8:
5005 case Intrinsic::amdgcn_mfma_f32_4x4x2bf16:
5006 case Intrinsic::amdgcn_mfma_f32_16x16x1f32:
5007 case Intrinsic::amdgcn_mfma_f32_16x16x4f32:
5008 case Intrinsic::amdgcn_mfma_f32_16x16x4f16:
5009 case Intrinsic::amdgcn_mfma_f32_16x16x16f16:
5010 case Intrinsic::amdgcn_mfma_i32_16x16x4i8:
5011 case Intrinsic::amdgcn_mfma_i32_16x16x16i8:
5012 case Intrinsic::amdgcn_mfma_f32_16x16x2bf16:
5013 case Intrinsic::amdgcn_mfma_f32_16x16x8bf16:
5014 case Intrinsic::amdgcn_mfma_f32_32x32x1f32:
5015 case Intrinsic::amdgcn_mfma_f32_32x32x2f32:
5016 case Intrinsic::amdgcn_mfma_f32_32x32x4f16:
5017 case Intrinsic::amdgcn_mfma_f32_32x32x8f16:
5018 case Intrinsic::amdgcn_mfma_i32_32x32x4i8:
5019 case Intrinsic::amdgcn_mfma_i32_32x32x8i8:
5020 case Intrinsic::amdgcn_mfma_f32_32x32x2bf16:
5021 case Intrinsic::amdgcn_mfma_f32_32x32x4bf16:
5022 case Intrinsic::amdgcn_mfma_f32_32x32x4bf16_1k:
5023 case Intrinsic::amdgcn_mfma_f32_16x16x4bf16_1k:
5024 case Intrinsic::amdgcn_mfma_f32_4x4x4bf16_1k:
5025 case Intrinsic::amdgcn_mfma_f32_32x32x8bf16_1k:
5026 case Intrinsic::amdgcn_mfma_f32_16x16x16bf16_1k:
5027 case Intrinsic::amdgcn_mfma_f64_16x16x4f64:
5028 case Intrinsic::amdgcn_mfma_f64_4x4x4f64:
5029 case Intrinsic::amdgcn_mfma_i32_16x16x32_i8:
5030 case Intrinsic::amdgcn_mfma_i32_32x32x16_i8:
5031 case Intrinsic::amdgcn_mfma_f32_16x16x8_xf32:
5032 case Intrinsic::amdgcn_mfma_f32_32x32x4_xf32:
5033 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf8_bf8:
5034 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf8_fp8:
5035 case Intrinsic::amdgcn_mfma_f32_16x16x32_fp8_bf8:
5036 case Intrinsic::amdgcn_mfma_f32_16x16x32_fp8_fp8:
5037 case Intrinsic::amdgcn_mfma_f32_32x32x16_bf8_bf8:
5038 case Intrinsic::amdgcn_mfma_f32_32x32x16_bf8_fp8:
5039 case Intrinsic::amdgcn_mfma_f32_32x32x16_fp8_bf8:
5040 case Intrinsic::amdgcn_mfma_f32_32x32x16_fp8_fp8:
5041 case Intrinsic::amdgcn_mfma_f32_16x16x32_f16:
5042 case Intrinsic::amdgcn_mfma_f32_32x32x16_f16:
5043 case Intrinsic::amdgcn_mfma_i32_16x16x64_i8:
5044 case Intrinsic::amdgcn_mfma_i32_32x32x32_i8:
5045 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf16: {
5046 // Default for MAI intrinsics.
5047 // srcC can also be an immediate which can be folded later.
5048 // FIXME: Should we eventually add an alternative mapping with AGPR src
5049 // for srcA/srcB?
5050 //
5051 // vdst, srcA, srcB, srcC
5053 OpdsMapping[0] =
5054 Info->mayNeedAGPRs()
5055 ? getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI)
5056 : getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5057 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5058 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5059 OpdsMapping[4] =
5060 Info->mayNeedAGPRs()
5061 ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
5062 : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5063 break;
5064 }
5065 case Intrinsic::amdgcn_mfma_scale_f32_16x16x128_f8f6f4:
5066 case Intrinsic::amdgcn_mfma_scale_f32_32x32x64_f8f6f4: {
5068 OpdsMapping[0] =
5069 Info->mayNeedAGPRs()
5070 ? getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI)
5071 : getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5072
5073 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5074 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5075 OpdsMapping[4] =
5076 Info->mayNeedAGPRs()
5077 ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
5078 : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5079
5080 OpdsMapping[8] = getVGPROpMapping(MI.getOperand(8).getReg(), MRI, *TRI);
5081 OpdsMapping[10] = getVGPROpMapping(MI.getOperand(10).getReg(), MRI, *TRI);
5082 break;
5083 }
5084 case Intrinsic::amdgcn_smfmac_f32_16x16x32_f16:
5085 case Intrinsic::amdgcn_smfmac_f32_32x32x16_f16:
5086 case Intrinsic::amdgcn_smfmac_f32_16x16x32_bf16:
5087 case Intrinsic::amdgcn_smfmac_f32_32x32x16_bf16:
5088 case Intrinsic::amdgcn_smfmac_i32_16x16x64_i8:
5089 case Intrinsic::amdgcn_smfmac_i32_32x32x32_i8:
5090 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf8_bf8:
5091 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf8_fp8:
5092 case Intrinsic::amdgcn_smfmac_f32_16x16x64_fp8_bf8:
5093 case Intrinsic::amdgcn_smfmac_f32_16x16x64_fp8_fp8:
5094 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf8_bf8:
5095 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf8_fp8:
5096 case Intrinsic::amdgcn_smfmac_f32_32x32x32_fp8_bf8:
5097 case Intrinsic::amdgcn_smfmac_f32_32x32x32_fp8_fp8:
5098 case Intrinsic::amdgcn_smfmac_f32_16x16x64_f16:
5099 case Intrinsic::amdgcn_smfmac_f32_32x32x32_f16:
5100 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf16:
5101 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf16:
5102 case Intrinsic::amdgcn_smfmac_i32_16x16x128_i8:
5103 case Intrinsic::amdgcn_smfmac_i32_32x32x64_i8:
5104 case Intrinsic::amdgcn_smfmac_f32_16x16x128_bf8_bf8:
5105 case Intrinsic::amdgcn_smfmac_f32_16x16x128_bf8_fp8:
5106 case Intrinsic::amdgcn_smfmac_f32_16x16x128_fp8_bf8:
5107 case Intrinsic::amdgcn_smfmac_f32_16x16x128_fp8_fp8:
5108 case Intrinsic::amdgcn_smfmac_f32_32x32x64_bf8_bf8:
5109 case Intrinsic::amdgcn_smfmac_f32_32x32x64_bf8_fp8:
5110 case Intrinsic::amdgcn_smfmac_f32_32x32x64_fp8_bf8:
5111 case Intrinsic::amdgcn_smfmac_f32_32x32x64_fp8_fp8: {
5112 // vdst, srcA, srcB, srcC, idx
5113 OpdsMapping[0] = getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5114 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5115 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5116 OpdsMapping[4] = getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5117 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5118 break;
5119 }
5120 case Intrinsic::amdgcn_interp_p1:
5121 case Intrinsic::amdgcn_interp_p2:
5122 case Intrinsic::amdgcn_interp_mov:
5123 case Intrinsic::amdgcn_interp_p1_f16:
5124 case Intrinsic::amdgcn_interp_p2_f16:
5125 case Intrinsic::amdgcn_lds_param_load: {
5126 const int M0Idx = MI.getNumOperands() - 1;
5127 Register M0Reg = MI.getOperand(M0Idx).getReg();
5128 unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
5129 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5130
5131 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5132 for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
5133 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5134
5135 // Must be SGPR, but we must take whatever the original bank is and fix it
5136 // later.
5137 OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
5138 break;
5139 }
5140 case Intrinsic::amdgcn_interp_inreg_p10:
5141 case Intrinsic::amdgcn_interp_inreg_p2:
5142 case Intrinsic::amdgcn_interp_inreg_p10_f16:
5143 case Intrinsic::amdgcn_interp_inreg_p2_f16:
5144 case Intrinsic::amdgcn_interp_p10_rtz_f16:
5145 case Intrinsic::amdgcn_interp_p2_rtz_f16: {
5146 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5147 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5148 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5149 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5150 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5151 break;
5152 }
5153 case Intrinsic::amdgcn_permlane16_swap:
5154 case Intrinsic::amdgcn_permlane32_swap: {
5155 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5156 OpdsMapping[0] = OpdsMapping[1] = OpdsMapping[3] = OpdsMapping[4] =
5157 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5158 break;
5159 }
5160 case Intrinsic::amdgcn_ballot: {
5161 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5162 unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5163 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
5164 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, SrcSize);
5165 break;
5166 }
5167 case Intrinsic::amdgcn_inverse_ballot: {
5168 // This must be an SGPR, but accept a VGPR.
5169 Register MaskReg = MI.getOperand(2).getReg();
5170 unsigned MaskSize = MRI.getType(MaskReg).getSizeInBits();
5171 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
5172 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5173 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, MaskSize);
5174 break;
5175 }
5176 case Intrinsic::amdgcn_bitop3: {
5177 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5178 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5179 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5180 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5181 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5182 break;
5183 }
5184 case Intrinsic::amdgcn_s_quadmask:
5185 case Intrinsic::amdgcn_s_wqm: {
5186 Register MaskReg = MI.getOperand(2).getReg();
5187 unsigned MaskSize = MRI.getType(MaskReg).getSizeInBits();
5188 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
5189 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, MaskSize);
5190 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, MaskSize);
5191 break;
5192 }
5193 case Intrinsic::amdgcn_wave_reduce_add:
5194 case Intrinsic::amdgcn_wave_reduce_sub:
5195 case Intrinsic::amdgcn_wave_reduce_min:
5196 case Intrinsic::amdgcn_wave_reduce_umin:
5197 case Intrinsic::amdgcn_wave_reduce_max:
5198 case Intrinsic::amdgcn_wave_reduce_umax:
5199 case Intrinsic::amdgcn_wave_reduce_and:
5200 case Intrinsic::amdgcn_wave_reduce_or:
5201 case Intrinsic::amdgcn_wave_reduce_xor: {
5202 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5203 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
5204 unsigned OpSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5205 auto regBankID =
5206 isSALUMapping(MI) ? AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
5207 OpdsMapping[2] = AMDGPU::getValueMapping(regBankID, OpSize);
5208 break;
5209 }
5210 case Intrinsic::amdgcn_s_bitreplicate:
5211 Register MaskReg = MI.getOperand(2).getReg();
5212 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
5213 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 64);
5214 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, 32);
5215 }
5216 break;
5217 }
5218 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:
5219 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_D16:
5220 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_NORET:
5221 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE:
5222 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE_D16: {
5223 auto IntrID = AMDGPU::getIntrinsicID(MI);
5224 const AMDGPU::RsrcIntrinsic *RSrcIntrin = AMDGPU::lookupRsrcIntrinsic(IntrID);
5225 assert(RSrcIntrin && "missing RsrcIntrinsic for image intrinsic");
5226 // Non-images can have complications from operands that allow both SGPR
5227 // and VGPR. For now it's too complicated to figure out the final opcode
5228 // to derive the register bank from the MCInstrDesc.
5229 assert(RSrcIntrin->IsImage);
5230 return getImageMapping(MRI, MI, RSrcIntrin->RsrcArg);
5231 }
5232 case AMDGPU::G_AMDGPU_BVH_INTERSECT_RAY:
5233 case AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY:
5234 case AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY: {
5235 bool IsDualOrBVH8 =
5236 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY ||
5237 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY;
5238 unsigned NumMods = IsDualOrBVH8 ? 0 : 1; // Has A16 modifier
5239 unsigned LastRegOpIdx = MI.getNumExplicitOperands() - 1 - NumMods;
5240 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5241 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5242 if (IsDualOrBVH8) {
5243 OpdsMapping[1] = AMDGPU::getValueMapping(
5244 AMDGPU::VGPRRegBankID,
5245 MRI.getType(MI.getOperand(1).getReg()).getSizeInBits());
5246 OpdsMapping[2] = AMDGPU::getValueMapping(
5247 AMDGPU::VGPRRegBankID,
5248 MRI.getType(MI.getOperand(2).getReg()).getSizeInBits());
5249 }
5250 OpdsMapping[LastRegOpIdx] =
5251 getSGPROpMapping(MI.getOperand(LastRegOpIdx).getReg(), MRI, *TRI);
5252 if (LastRegOpIdx == 3) {
5253 // Sequential form: all operands combined into VGPR256/VGPR512
5254 unsigned Size = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5255 if (Size > 256)
5256 Size = 512;
5257 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5258 } else {
5259 // NSA form
5260 unsigned FirstSrcOpIdx = IsDualOrBVH8 ? 4 : 2;
5261 for (unsigned I = FirstSrcOpIdx; I < LastRegOpIdx; ++I) {
5262 unsigned Size = MRI.getType(MI.getOperand(I).getReg()).getSizeInBits();
5263 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5264 }
5265 }
5266 break;
5267 }
5268 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
5269 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS: {
5270 auto IntrID = cast<GIntrinsic>(MI).getIntrinsicID();
5271 switch (IntrID) {
5272 case Intrinsic::amdgcn_s_getreg:
5273 case Intrinsic::amdgcn_s_memtime:
5274 case Intrinsic::amdgcn_s_memrealtime:
5275 case Intrinsic::amdgcn_s_get_waveid_in_workgroup:
5276 case Intrinsic::amdgcn_s_sendmsg_rtn: {
5277 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5278 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5279 break;
5280 }
5281 case Intrinsic::amdgcn_global_atomic_csub:
5282 case Intrinsic::amdgcn_global_atomic_fmin_num:
5283 case Intrinsic::amdgcn_global_atomic_fmax_num:
5284 case Intrinsic::amdgcn_flat_atomic_fmin_num:
5285 case Intrinsic::amdgcn_flat_atomic_fmax_num:
5286 case Intrinsic::amdgcn_atomic_cond_sub_u32:
5287 case Intrinsic::amdgcn_global_atomic_ordered_add_b64:
5288 case Intrinsic::amdgcn_global_load_tr_b64:
5289 case Intrinsic::amdgcn_global_load_tr_b128:
5290 case Intrinsic::amdgcn_global_load_tr4_b64:
5291 case Intrinsic::amdgcn_global_load_tr6_b96:
5292 case Intrinsic::amdgcn_ds_load_tr8_b64:
5293 case Intrinsic::amdgcn_ds_load_tr16_b128:
5294 case Intrinsic::amdgcn_ds_load_tr4_b64:
5295 case Intrinsic::amdgcn_ds_load_tr6_b96:
5296 case Intrinsic::amdgcn_flat_load_monitor_b32:
5297 case Intrinsic::amdgcn_flat_load_monitor_b64:
5298 case Intrinsic::amdgcn_flat_load_monitor_b128:
5299 case Intrinsic::amdgcn_global_load_monitor_b32:
5300 case Intrinsic::amdgcn_global_load_monitor_b64:
5301 case Intrinsic::amdgcn_global_load_monitor_b128:
5302 case Intrinsic::amdgcn_ds_read_tr4_b64:
5303 case Intrinsic::amdgcn_ds_read_tr6_b96:
5304 case Intrinsic::amdgcn_ds_read_tr8_b64:
5305 case Intrinsic::amdgcn_ds_read_tr16_b64:
5306 case Intrinsic::amdgcn_ds_atomic_async_barrier_arrive_b64:
5307 case Intrinsic::amdgcn_ds_atomic_barrier_arrive_rtn_b64:
5309 case Intrinsic::amdgcn_ds_ordered_add:
5310 case Intrinsic::amdgcn_ds_ordered_swap: {
5311 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5312 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5313 unsigned M0Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5314 AMDGPU::SGPRRegBankID);
5315 OpdsMapping[2] = AMDGPU::getValueMapping(M0Bank, 32);
5316 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5317 break;
5318 }
5319 case Intrinsic::amdgcn_ds_append:
5320 case Intrinsic::amdgcn_ds_consume: {
5321 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5322 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5323 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5324 break;
5325 }
5326 case Intrinsic::amdgcn_exp_compr:
5327 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5328 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5329 break;
5330 case Intrinsic::amdgcn_exp:
5331 // FIXME: Could we support packed types here?
5332 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5333 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5334 OpdsMapping[5] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5335 OpdsMapping[6] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5336 break;
5337 case Intrinsic::amdgcn_exp_row:
5338 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5339 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5340 OpdsMapping[5] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5341 OpdsMapping[6] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5342 OpdsMapping[8] = getSGPROpMapping(MI.getOperand(8).getReg(), MRI, *TRI);
5343 break;
5344 case Intrinsic::amdgcn_s_sendmsg:
5345 case Intrinsic::amdgcn_s_sendmsghalt: {
5346 // This must be an SGPR, but accept a VGPR.
5347 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5348 AMDGPU::SGPRRegBankID);
5349 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5350 break;
5351 }
5352 case Intrinsic::amdgcn_s_setreg: {
5353 // This must be an SGPR, but accept a VGPR.
5354 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5355 AMDGPU::SGPRRegBankID);
5356 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5357 break;
5358 }
5359 case Intrinsic::amdgcn_s_ttracedata: {
5360 // This must be an SGPR, but accept a VGPR.
5361 unsigned Bank =
5362 getRegBankID(MI.getOperand(1).getReg(), MRI, AMDGPU::SGPRRegBankID);
5363 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
5364 break;
5365 }
5366 case Intrinsic::amdgcn_end_cf: {
5367 unsigned Size = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5368 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5369 break;
5370 }
5371 case Intrinsic::amdgcn_else: {
5372 unsigned WaveSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5373 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5374 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
5375 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
5376 break;
5377 }
5378 case Intrinsic::amdgcn_init_whole_wave:
5379 case Intrinsic::amdgcn_live_mask: {
5380 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5381 break;
5382 }
5383 case Intrinsic::amdgcn_wqm_demote:
5384 case Intrinsic::amdgcn_kill: {
5385 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5386 break;
5387 }
5388 case Intrinsic::amdgcn_raw_buffer_load:
5389 case Intrinsic::amdgcn_raw_ptr_buffer_load:
5390 case Intrinsic::amdgcn_raw_atomic_buffer_load:
5391 case Intrinsic::amdgcn_raw_ptr_atomic_buffer_load:
5392 case Intrinsic::amdgcn_raw_tbuffer_load:
5393 case Intrinsic::amdgcn_raw_ptr_tbuffer_load: {
5394 // FIXME: Should make intrinsic ID the last operand of the instruction,
5395 // then this would be the same as store
5396 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5397 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5398 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5399 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5400 break;
5401 }
5402 case Intrinsic::amdgcn_raw_buffer_load_lds:
5403 case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: {
5404 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5405 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5406 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5407 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5408 break;
5409 }
5410 case Intrinsic::amdgcn_raw_buffer_store:
5411 case Intrinsic::amdgcn_raw_ptr_buffer_store:
5412 case Intrinsic::amdgcn_raw_buffer_store_format:
5413 case Intrinsic::amdgcn_raw_ptr_buffer_store_format:
5414 case Intrinsic::amdgcn_raw_tbuffer_store:
5415 case Intrinsic::amdgcn_raw_ptr_tbuffer_store: {
5416 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5417 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5418 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5419 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5420 break;
5421 }
5422 case Intrinsic::amdgcn_struct_buffer_load:
5423 case Intrinsic::amdgcn_struct_ptr_buffer_load:
5424 case Intrinsic::amdgcn_struct_tbuffer_load:
5425 case Intrinsic::amdgcn_struct_ptr_tbuffer_load:
5426 case Intrinsic::amdgcn_struct_atomic_buffer_load:
5427 case Intrinsic::amdgcn_struct_ptr_atomic_buffer_load: {
5428 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5429 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5430 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5431 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5432 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5433 break;
5434 }
5435 case Intrinsic::amdgcn_struct_buffer_load_lds:
5436 case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
5437 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5438 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5439 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5440 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5441 OpdsMapping[6] = getSGPROpMapping(MI.getOperand(6).getReg(), MRI, *TRI);
5442 break;
5443 }
5444 case Intrinsic::amdgcn_struct_buffer_store:
5445 case Intrinsic::amdgcn_struct_ptr_buffer_store:
5446 case Intrinsic::amdgcn_struct_tbuffer_store:
5447 case Intrinsic::amdgcn_struct_ptr_tbuffer_store: {
5448 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5449 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5450 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5451 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5452 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5453 break;
5454 }
5455 case Intrinsic::amdgcn_init_exec_from_input: {
5456 unsigned Size = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5457 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5458 break;
5459 }
5460 case Intrinsic::amdgcn_ds_gws_init:
5461 case Intrinsic::amdgcn_ds_gws_barrier:
5462 case Intrinsic::amdgcn_ds_gws_sema_br: {
5463 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5464
5465 // This must be an SGPR, but accept a VGPR.
5466 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5467 AMDGPU::SGPRRegBankID);
5468 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5469 break;
5470 }
5471 case Intrinsic::amdgcn_ds_gws_sema_v:
5472 case Intrinsic::amdgcn_ds_gws_sema_p:
5473 case Intrinsic::amdgcn_ds_gws_sema_release_all: {
5474 // This must be an SGPR, but accept a VGPR.
5475 unsigned Bank = getRegBankID(MI.getOperand(1).getReg(), MRI,
5476 AMDGPU::SGPRRegBankID);
5477 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
5478 break;
5479 }
5480 case Intrinsic::amdgcn_cluster_load_b32:
5481 case Intrinsic::amdgcn_cluster_load_b64:
5482 case Intrinsic::amdgcn_cluster_load_b128: {
5483 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5484 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5485 unsigned M0Bank =
5486 getRegBankID(MI.getOperand(4).getReg(), MRI, AMDGPU::SGPRRegBankID);
5487 OpdsMapping[4] = AMDGPU::getValueMapping(M0Bank, 32);
5488 break;
5489 }
5490 case Intrinsic::amdgcn_cluster_load_async_to_lds_b8:
5491 case Intrinsic::amdgcn_cluster_load_async_to_lds_b32:
5492 case Intrinsic::amdgcn_cluster_load_async_to_lds_b64:
5493 case Intrinsic::amdgcn_cluster_load_async_to_lds_b128: {
5494 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5495 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5496 unsigned M0Bank =
5497 getRegBankID(MI.getOperand(5).getReg(), MRI, AMDGPU::SGPRRegBankID);
5498 OpdsMapping[5] = AMDGPU::getValueMapping(M0Bank, 32);
5499 break;
5500 }
5501 case Intrinsic::amdgcn_global_store_async_from_lds_b8:
5502 case Intrinsic::amdgcn_global_store_async_from_lds_b32:
5503 case Intrinsic::amdgcn_global_store_async_from_lds_b64:
5504 case Intrinsic::amdgcn_global_store_async_from_lds_b128:
5505 case Intrinsic::amdgcn_global_load_async_to_lds_b8:
5506 case Intrinsic::amdgcn_global_load_async_to_lds_b32:
5507 case Intrinsic::amdgcn_global_load_async_to_lds_b64:
5508 case Intrinsic::amdgcn_global_load_async_to_lds_b128:
5509 case Intrinsic::amdgcn_load_to_lds:
5510 case Intrinsic::amdgcn_global_load_lds: {
5511 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5512 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5513 break;
5514 }
5515 case Intrinsic::amdgcn_lds_direct_load: {
5516 const int M0Idx = MI.getNumOperands() - 1;
5517 Register M0Reg = MI.getOperand(M0Idx).getReg();
5518 unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
5519 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5520
5521 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5522 for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
5523 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5524
5525 // Must be SGPR, but we must take whatever the original bank is and fix it
5526 // later.
5527 OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
5528 break;
5529 }
5530 case Intrinsic::amdgcn_ds_add_gs_reg_rtn:
5531 case Intrinsic::amdgcn_ds_sub_gs_reg_rtn:
5532 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5533 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5534 break;
5535 case Intrinsic::amdgcn_ds_bvh_stack_rtn:
5536 case Intrinsic::amdgcn_ds_bvh_stack_push4_pop1_rtn:
5537 case Intrinsic::amdgcn_ds_bvh_stack_push8_pop1_rtn:
5538 case Intrinsic::amdgcn_ds_bvh_stack_push8_pop2_rtn: {
5539 OpdsMapping[0] =
5540 getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI); // %vdst
5541 OpdsMapping[1] =
5542 getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI); // %addr
5543 OpdsMapping[3] =
5544 getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI); // %addr
5545 OpdsMapping[4] =
5546 getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI); // %data0
5547 OpdsMapping[5] =
5548 getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI); // %data1
5549 break;
5550 }
5551 case Intrinsic::amdgcn_s_sleep_var:
5552 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5553 break;
5554 case Intrinsic::amdgcn_s_barrier_join:
5555 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5556 break;
5557 case Intrinsic::amdgcn_s_barrier_init:
5558 case Intrinsic::amdgcn_s_barrier_signal_var:
5559 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5560 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5561 break;
5562 case Intrinsic::amdgcn_s_barrier_signal_isfirst: {
5563 const unsigned ResultSize = 1;
5564 OpdsMapping[0] =
5565 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, ResultSize);
5566 break;
5567 }
5568 case Intrinsic::amdgcn_s_get_barrier_state:
5569 case Intrinsic::amdgcn_s_get_named_barrier_state: {
5570 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5571 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5572 break;
5573 }
5574 case Intrinsic::amdgcn_pops_exiting_wave_id:
5575 return getDefaultMappingSOP(MI);
5576 case Intrinsic::amdgcn_tensor_load_to_lds_d2:
5577 case Intrinsic::amdgcn_tensor_store_from_lds_d2:
5578 case Intrinsic::amdgcn_tensor_load_to_lds:
5579 case Intrinsic::amdgcn_tensor_store_from_lds: {
5580 // Lie and claim everything is legal, even all operands need to be
5581 // SGPRs. applyMapping will have to deal with it with readfirstlane.
5582 for (unsigned I = 1; I < MI.getNumOperands(); ++I) {
5583 if (MI.getOperand(I).isReg()) {
5584 Register Reg = MI.getOperand(I).getReg();
5585 auto OpBank = getRegBankID(Reg, MRI);
5586 unsigned Size = getSizeInBits(Reg, MRI, *TRI);
5587 OpdsMapping[I] = AMDGPU::getValueMapping(OpBank, Size);
5588 }
5589 }
5590 break;
5591 }
5592 case Intrinsic::amdgcn_s_prefetch_data: {
5593 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5594 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5595 break;
5596 }
5597 case Intrinsic::amdgcn_flat_prefetch:
5598 case Intrinsic::amdgcn_global_prefetch:
5599 return getDefaultMappingVOP(MI);
5600 default:
5602 }
5603 break;
5604 }
5605 case AMDGPU::G_SELECT: {
5606 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5607 unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5608 AMDGPU::SGPRRegBankID);
5609 unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI,
5610 AMDGPU::SGPRRegBankID);
5611 bool SGPRSrcs = Op2Bank == AMDGPU::SGPRRegBankID &&
5612 Op3Bank == AMDGPU::SGPRRegBankID;
5613
5614 unsigned CondBankDefault = SGPRSrcs ?
5615 AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
5616 unsigned CondBank = getRegBankID(MI.getOperand(1).getReg(), MRI,
5617 CondBankDefault);
5618 if (CondBank == AMDGPU::SGPRRegBankID)
5619 CondBank = SGPRSrcs ? AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
5620 else if (CondBank == AMDGPU::VGPRRegBankID)
5621 CondBank = AMDGPU::VCCRegBankID;
5622
5623 unsigned Bank = SGPRSrcs && CondBank == AMDGPU::SGPRRegBankID ?
5624 AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
5625
5626 assert(CondBank == AMDGPU::VCCRegBankID || CondBank == AMDGPU::SGPRRegBankID);
5627
5628 // TODO: Should report 32-bit for scalar condition type.
5629 if (Size == 64) {
5630 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5631 OpdsMapping[1] = AMDGPU::getValueMapping(CondBank, 1);
5632 OpdsMapping[2] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5633 OpdsMapping[3] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5634 } else {
5635 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, Size);
5636 OpdsMapping[1] = AMDGPU::getValueMapping(CondBank, 1);
5637 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, Size);
5638 OpdsMapping[3] = AMDGPU::getValueMapping(Bank, Size);
5639 }
5640
5641 break;
5642 }
5643
5644 case AMDGPU::G_SI_CALL: {
5645 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 64);
5646 // Lie and claim everything is legal, even though some need to be
5647 // SGPRs. applyMapping will have to deal with it as a waterfall loop.
5648 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5649
5650 // Allow anything for implicit arguments
5651 for (unsigned I = 4; I < MI.getNumOperands(); ++I) {
5652 if (MI.getOperand(I).isReg()) {
5653 Register Reg = MI.getOperand(I).getReg();
5654 auto OpBank = getRegBankID(Reg, MRI);
5655 unsigned Size = getSizeInBits(Reg, MRI, *TRI);
5656 OpdsMapping[I] = AMDGPU::getValueMapping(OpBank, Size);
5657 }
5658 }
5659 break;
5660 }
5661 case AMDGPU::G_LOAD:
5662 case AMDGPU::G_ZEXTLOAD:
5663 case AMDGPU::G_SEXTLOAD:
5664 return getInstrMappingForLoad(MI);
5665
5666 case AMDGPU::G_ATOMICRMW_XCHG:
5667 case AMDGPU::G_ATOMICRMW_ADD:
5668 case AMDGPU::G_ATOMICRMW_SUB:
5669 case AMDGPU::G_ATOMICRMW_AND:
5670 case AMDGPU::G_ATOMICRMW_OR:
5671 case AMDGPU::G_ATOMICRMW_XOR:
5672 case AMDGPU::G_ATOMICRMW_MAX:
5673 case AMDGPU::G_ATOMICRMW_MIN:
5674 case AMDGPU::G_ATOMICRMW_UMAX:
5675 case AMDGPU::G_ATOMICRMW_UMIN:
5676 case AMDGPU::G_ATOMICRMW_FADD:
5677 case AMDGPU::G_ATOMICRMW_FMIN:
5678 case AMDGPU::G_ATOMICRMW_FMAX:
5679 case AMDGPU::G_ATOMICRMW_UINC_WRAP:
5680 case AMDGPU::G_ATOMICRMW_UDEC_WRAP:
5681 case AMDGPU::G_AMDGPU_ATOMIC_CMPXCHG: {
5682 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5683 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
5684 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5685 break;
5686 }
5687 case AMDGPU::G_ATOMIC_CMPXCHG: {
5688 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5689 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
5690 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5691 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5692 break;
5693 }
5694 case AMDGPU::G_BRCOND: {
5695 unsigned Bank = getRegBankID(MI.getOperand(0).getReg(), MRI,
5696 AMDGPU::SGPRRegBankID);
5697 assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);
5698 if (Bank != AMDGPU::SGPRRegBankID)
5699 Bank = AMDGPU::VCCRegBankID;
5700
5701 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, 1);
5702 break;
5703 }
5704 case AMDGPU::G_INTRINSIC_FPTRUNC_ROUND:
5705 return getDefaultMappingVOP(MI);
5706 case AMDGPU::G_PREFETCH:
5707 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5708 break;
5709 case AMDGPU::G_AMDGPU_WHOLE_WAVE_FUNC_SETUP:
5710 case AMDGPU::G_AMDGPU_WHOLE_WAVE_FUNC_RETURN:
5711 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5712 break;
5713 }
5714
5715 return getInstructionMapping(/*ID*/1, /*Cost*/1,
5716 getOperandsMapping(OpdsMapping),
5717 MI.getNumOperands());
5718}
unsigned const MachineRegisterInfo * MRI
static unsigned getIntrinsicID(const SDNode *N)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
Contains the definition of a TargetInstrInfo class that is common to all AMD GPUs.
constexpr LLT S16
constexpr LLT S1
constexpr LLT S32
constexpr LLT S64
AMDGPU Register Bank Select
static bool substituteSimpleCopyRegs(const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper, unsigned OpIdx)
static unsigned regBankBoolUnion(unsigned RB0, unsigned RB1)
static std::pair< Register, unsigned > getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg)
static Register constrainRegToBank(MachineRegisterInfo &MRI, MachineIRBuilder &B, Register &Reg, const RegisterBank &Bank)
static std::pair< Register, Register > unpackV2S16ToS32(MachineIRBuilder &B, Register Src, unsigned ExtOpcode)
static void extendLow32IntoHigh32(MachineIRBuilder &B, Register Hi32Reg, Register Lo32Reg, unsigned ExtOpc, const RegisterBank &RegBank, bool IsBooleanSrc=false)
Implement extending a 32-bit value to a 64-bit value.
static unsigned getExtendOp(unsigned Opc)
static bool isVectorRegisterBank(const RegisterBank &Bank)
static unsigned regBankUnion(unsigned RB0, unsigned RB1)
static std::pair< LLT, LLT > splitUnequalType(LLT Ty, unsigned FirstSize)
Split Ty into 2 pieces.
static void setRegsToType(MachineRegisterInfo &MRI, ArrayRef< Register > Regs, LLT NewTy)
Replace the current type each register in Regs has with NewTy.
static void reinsertVectorIndexAdd(MachineIRBuilder &B, MachineInstr &IdxUseInstr, unsigned OpIdx, unsigned ConstOffset)
Utility function for pushing dynamic vector indexes with a constant offset into waterfall loops.
static LLT widen96To128(LLT Ty)
static LLT getHalfSizedType(LLT Ty)
static unsigned getSBufferLoadCorrespondingBufferLoadOpcode(unsigned Opc)
This file declares the targeting of the RegisterBankInfo class for AMDGPU.
Rewrite undef for PHI
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
AMD GCN specific subclass of TargetSubtarget.
Declares convenience wrapper classes for interpreting MachineInstr instances as specific generic oper...
IRTranslator LLVM IR MI
const size_t AbstractManglingParser< Derived, Alloc >::NumOps
const AbstractManglingParser< Derived, Alloc >::OperatorInfo AbstractManglingParser< Derived, Alloc >::Ops[]
#define I(x, y, z)
Definition MD5.cpp:58
Contains matchers for matching SSA Machine Instructions.
mir Rename Register Operands
This file declares the MachineIRBuilder class.
Register Reg
Promote Memory to Register
Definition Mem2Reg.cpp:110
static bool isReg(const MCInst &MI, unsigned OpNo)
MachineInstr unsigned OpIdx
ConstantRange Range(APInt(BitWidth, Low), APInt(BitWidth, High))
static constexpr MCPhysReg SPReg
Interface definition for SIRegisterInfo.
static TableGen::Emitter::Opt Y("gen-skeleton-entry", EmitSkeleton, "Generate example skeleton entry")
static TableGen::Emitter::OptClass< SkeletonEmitter > X("gen-skeleton-class", "Generate example skeleton class")
bool applyMappingDynStackAlloc(MachineIRBuilder &B, const OperandsMapper &OpdMapper, MachineInstr &MI) const
std::pair< Register, unsigned > splitBufferOffsets(MachineIRBuilder &B, Register Offset) const
bool collectWaterfallOperands(SmallSet< Register, 4 > &SGPROperandRegs, MachineInstr &MI, MachineRegisterInfo &MRI, ArrayRef< unsigned > OpIndices) const
const InstructionMapping & getImageMapping(const MachineRegisterInfo &MRI, const MachineInstr &MI, int RsrcIdx) const
InstructionMappings addMappingFromTable(const MachineInstr &MI, const MachineRegisterInfo &MRI, const std::array< unsigned, NumOps > RegSrcOpIdx, ArrayRef< OpRegBankEntry< NumOps > > Table) const
unsigned copyCost(const RegisterBank &A, const RegisterBank &B, TypeSize Size) const override
Get the cost of a copy from B to A, or put differently, get the cost of A = COPY B.
RegisterBankInfo::InstructionMappings getInstrAlternativeMappingsIntrinsicWSideEffects(const MachineInstr &MI, const MachineRegisterInfo &MRI) const
bool buildVCopy(MachineIRBuilder &B, Register DstReg, Register SrcReg) const
bool executeInWaterfallLoop(MachineIRBuilder &B, iterator_range< MachineBasicBlock::iterator > Range, SmallSet< Register, 4 > &SGPROperandRegs) const
Legalize instruction MI where operands in OpIndices must be SGPRs.
const RegisterBank & getRegBankFromRegClass(const TargetRegisterClass &RC, LLT) const override
Get a register bank that covers RC.
AMDGPURegisterBankInfo(const GCNSubtarget &STI)
bool applyMappingMAD_64_32(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
unsigned getRegBankID(Register Reg, const MachineRegisterInfo &MRI, unsigned Default=AMDGPU::VGPRRegBankID) const
Register handleD16VData(MachineIRBuilder &B, MachineRegisterInfo &MRI, Register Reg) const
Handle register layout difference for f16 images for some subtargets.
const RegisterBankInfo::InstructionMapping & getInstrMappingForLoad(const MachineInstr &MI) const
void applyMappingImpl(MachineIRBuilder &Builder, const OperandsMapper &OpdMapper) const override
See RegisterBankInfo::applyMapping.
bool applyMappingBFE(MachineIRBuilder &B, const OperandsMapper &OpdMapper, bool Signed) const
bool applyMappingImage(MachineIRBuilder &B, MachineInstr &MI, const OperandsMapper &OpdMapper, int RSrcIdx) const
const ValueMapping * getVGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
bool isScalarLoadLegal(const MachineInstr &MI) const
unsigned setBufferOffsets(MachineIRBuilder &B, Register CombinedOffset, Register &VOffsetReg, Register &SOffsetReg, int64_t &InstOffsetVal, Align Alignment) const
const ValueMapping * getSGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
bool applyMappingLoad(MachineIRBuilder &B, const OperandsMapper &OpdMapper, MachineInstr &MI) const
void split64BitValueForMapping(MachineIRBuilder &B, SmallVector< Register, 2 > &Regs, LLT HalfTy, Register Reg) const
Split 64-bit value Reg into two 32-bit halves and populate them into Regs.
const ValueMapping * getValueMappingForPtr(const MachineRegisterInfo &MRI, Register Ptr) const
Return the mapping for a pointer argument.
unsigned getMappingType(const MachineRegisterInfo &MRI, const MachineInstr &MI) const
RegisterBankInfo::InstructionMappings getInstrAlternativeMappingsIntrinsic(const MachineInstr &MI, const MachineRegisterInfo &MRI) const
bool isDivergentRegBank(const RegisterBank *RB) const override
Returns true if the register bank is considered divergent.
void constrainOpWithReadfirstlane(MachineIRBuilder &B, MachineInstr &MI, unsigned OpIdx) const
InstructionMappings getInstrAlternativeMappings(const MachineInstr &MI) const override
Get the alternative mappings for MI.
const InstructionMapping & getDefaultMappingSOP(const MachineInstr &MI) const
const InstructionMapping & getDefaultMappingAllVGPR(const MachineInstr &MI) const
const InstructionMapping & getInstrMapping(const MachineInstr &MI) const override
This function must return a legal mapping, because AMDGPURegisterBankInfo::getInstrAlternativeMapping...
unsigned getBreakDownCost(const ValueMapping &ValMapping, const RegisterBank *CurBank=nullptr) const override
Get the cost of using ValMapping to decompose a register.
const ValueMapping * getAGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
const InstructionMapping & getDefaultMappingVOP(const MachineInstr &MI) const
bool isSALUMapping(const MachineInstr &MI) const
Register buildReadFirstLane(MachineIRBuilder &B, MachineRegisterInfo &MRI, Register Src) const
bool applyMappingSBufferLoad(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
void applyMappingSMULU64(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
static const LaneMaskConstants & get(const GCNSubtarget &ST)
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:41
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:678
@ ICMP_SLT
signed less than
Definition InstrTypes.h:707
@ ICMP_NE
not equal
Definition InstrTypes.h:700
A debug info location.
Definition DebugLoc.h:124
iterator find(const_arg_type_t< KeyT > Val)
Definition DenseMap.h:165
iterator end()
Definition DenseMap.h:81
std::pair< iterator, bool > insert(const std::pair< KeyT, ValueT > &KV)
Definition DenseMap.h:214
static constexpr ElementCount getFixed(ScalarTy MinVal)
Definition TypeSize.h:309
Abstract class that contains various methods for clients to notify about changes.
constexpr unsigned getScalarSizeInBits() const
constexpr bool isScalar() const
static constexpr LLT scalar(unsigned SizeInBits)
Get a low-level scalar or aggregate "bag of bits".
constexpr uint16_t getNumElements() const
Returns the number of elements in a vector LLT.
constexpr bool isVector() const
constexpr TypeSize getSizeInBits() const
Returns the total size of the type. Must only be called on sized types.
constexpr LLT getElementType() const
Returns the vector's element type. Only valid for vector types.
constexpr unsigned getAddressSpace() const
static constexpr LLT fixed_vector(unsigned NumElements, unsigned ScalarSizeInBits)
Get a low-level fixed-width vector of some number of elements and element width.
constexpr LLT getScalarType() const
static constexpr LLT scalarOrVector(ElementCount EC, LLT ScalarTy)
constexpr LLT divide(int Factor) const
Return a type that is Factor times smaller.
This is an important class for using LLVM in a threaded context.
Definition LLVMContext.h:68
LLVM_ABI void widenScalarSrc(MachineInstr &MI, LLT WideTy, unsigned OpIdx, unsigned ExtOpcode)
Legalize a single operand OpIdx of the machine instruction MI as a Use by extending the operand's typ...
LLVM_ABI LegalizeResult lowerAbsToMaxNeg(MachineInstr &MI)
LLVM_ABI LegalizeResult narrowScalar(MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy)
Legalize an instruction by reducing the width of the underlying scalar type.
LLVM_ABI LegalizeResult reduceLoadStoreWidth(GLoadStore &MI, unsigned TypeIdx, LLT NarrowTy)
@ Legalized
Instruction has been legalized and the MachineFunction changed.
LLVM_ABI LegalizeResult fewerElementsVector(MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy)
Legalize a vector instruction by splitting into multiple components, each acting on the same scalar t...
LLVM_ABI LegalizeResult widenScalar(MachineInstr &MI, unsigned TypeIdx, LLT WideTy)
Legalize an instruction by performing the operation on a wider scalar type (for example a 16-bit addi...
LLVM_ABI void widenScalarDst(MachineInstr &MI, LLT WideTy, unsigned OpIdx=0, unsigned TruncOpcode=TargetOpcode::G_TRUNC)
Legalize a single operand OpIdx of the machine instruction MI as a Def by extending the operand's typ...
TypeSize getValue() const
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
Helper class to build MachineInstr.
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
MachineInstrSpan provides an interface to get an iteration range containing the instruction it was in...
MachineBasicBlock::iterator begin()
MachineBasicBlock::iterator end()
Representation of each machine instruction.
const MachineBasicBlock * getParent() const
const MachineOperand & getOperand(unsigned i) const
A description of a memory reference used in the backend.
LocationSize getSize() const
Return the size in bytes of the memory reference.
unsigned getAddrSpace() const
bool isAtomic() const
Returns true if this operation has an atomic ordering requirement of unordered or higher,...
@ MODereferenceable
The memory access is dereferenceable (i.e., doesn't trap).
@ MOLoad
The memory access reads data.
@ MOInvariant
The memory access always returns the same value (or traps).
Flags getFlags() const
Return the raw flags of the source value,.
LLVM_ABI Align getAlign() const
Return the minimum known alignment in bytes of the actual memory reference.
MachineOperand class - Representation of each machine instruction operand.
LLVM_ABI void setReg(Register Reg)
Change the register this operand corresponds to.
Register getReg() const
getReg - Returns the register number.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
Helper class that represents how the value of an instruction may be mapped and what is the related co...
bool isValid() const
Check whether this object is valid.
Helper class used to get/create the virtual registers that will be used to replace the MachineOperand...
const InstructionMapping & getInstrMapping() const
The final mapping of the instruction.
MachineRegisterInfo & getMRI() const
The MachineRegisterInfo we used to realize the mapping.
iterator_range< SmallVectorImpl< Register >::const_iterator > getVRegs(unsigned OpIdx, bool ForDebug=false) const
Get all the virtual registers required to map the OpIdx-th operand of the instruction.
virtual InstructionMappings getInstrAlternativeMappings(const MachineInstr &MI) const
Get the alternative mappings for MI.
static const TargetRegisterClass * constrainGenericRegister(Register Reg, const TargetRegisterClass &RC, MachineRegisterInfo &MRI)
Constrain the (possibly generic) virtual register Reg to RC.
const InstructionMapping & getInstructionMapping(unsigned ID, unsigned Cost, const ValueMapping *OperandsMapping, unsigned NumOperands) const
Method to get a uniquely generated InstructionMapping.
static void applyDefaultMapping(const OperandsMapper &OpdMapper)
Helper method to apply something that is like the default mapping.
const ValueMapping & getValueMapping(unsigned StartIdx, unsigned Length, const RegisterBank &RegBank) const
The most common ValueMapping consists of a single PartialMapping.
const InstructionMapping & getInvalidInstructionMapping() const
Method to get a uniquely generated invalid InstructionMapping.
const RegisterBank & getRegBank(unsigned ID)
Get the register bank identified by ID.
const unsigned * Sizes
Hold the sizes of the register banks for all HwModes.
bool cannotCopy(const RegisterBank &Dst, const RegisterBank &Src, TypeSize Size) const
TypeSize getSizeInBits(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
Get the size in bits of Reg.
const ValueMapping * getOperandsMapping(Iterator Begin, Iterator End) const
Get the uniquely generated array of ValueMapping for the elements of between Begin and End.
SmallVector< const InstructionMapping *, 4 > InstructionMappings
Convenient type to represent the alternatives for mapping an instruction.
virtual unsigned copyCost(const RegisterBank &A, const RegisterBank &B, TypeSize Size) const
Get the cost of a copy from B to A, or put differently, get the cost of A = COPY B.
const InstructionMapping & getInstrMappingImpl(const MachineInstr &MI) const
Try to get the mapping of MI.
This class implements the register bank concept.
unsigned getID() const
Get the identifier of this register bank.
Wrapper class representing virtual and physical registers.
Definition Register.h:19
constexpr bool isVirtual() const
Return true if the specified register number is in the virtual register namespace.
Definition Register.h:74
static unsigned getMaxMUBUFImmOffset(const GCNSubtarget &ST)
This class keeps track of the SPI_SP_INPUT_ADDR config register, which tells the hardware which inter...
static bool shouldExpandVectorDynExt(unsigned EltSize, unsigned NumElem, bool IsDivergentIdx, const GCNSubtarget *Subtarget)
Check if EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT (<n x e>, var-idx) should be expanded into a set of cmp...
SmallSet - This maintains a set of unique values, optimizing for the case when the set is small (less...
Definition SmallSet.h:133
size_type count(const T &V) const
count - Return 1 if the element is in the set, 0 otherwise.
Definition SmallSet.h:175
bool empty() const
Definition SmallSet.h:168
std::pair< const_iterator, bool > insert(const T &V)
insert - Insert an element into the set if it isn't already there.
Definition SmallSet.h:181
void resize(size_type N)
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Register getReg() const
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition TypeSize.h:343
static LLVM_ABI IntegerType * getInt32Ty(LLVMContext &C)
Definition Type.cpp:297
self_iterator getIterator()
Definition ilist_node.h:130
A range adaptor for a pair of iterators.
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ CONSTANT_ADDRESS_32BIT
Address space for 32-bit constant memory.
@ REGION_ADDRESS
Address space for region memory. (GDS)
@ LOCAL_ADDRESS
Address space for local memory.
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
@ PRIVATE_ADDRESS
Address space for private memory.
@ BUFFER_RESOURCE
Address space for 128-bit buffer resources.
bool isFlatGlobalAddrSpace(unsigned AS)
bool isUniformMMO(const MachineMemOperand *MMO)
bool isExtendedGlobalAddrSpace(unsigned AS)
Intrinsic::ID getIntrinsicID(const MachineInstr &I)
Return the intrinsic ID for opcodes with the G_AMDGPU_INTRIN_ prefix.
std::pair< Register, unsigned > getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg, GISelValueTracking *ValueTracking=nullptr, bool CheckNUW=false)
Returns base register and constant offset.
const RsrcIntrinsic * lookupRsrcIntrinsic(unsigned Intr)
operand_type_match m_Reg()
SpecificConstantOrSplatMatch m_SpecificICstOrSplat(APInt RequestedValue)
Matches a RequestedValue constant or a constant splat of RequestedValue.
SpecificConstantMatch m_ZeroInt()
Convenience matchers for specific integer values.
ConstantMatch< APInt > m_ICst(APInt &Cst)
BinaryOp_match< LHS, RHS, TargetOpcode::G_ADD, true > m_GAdd(const LHS &L, const RHS &R)
bool mi_match(Reg R, const MachineRegisterInfo &MRI, Pattern &&P)
@ Kill
The last use of a register.
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:477
LLVM_ABI MachineInstr * getOpcodeDef(unsigned Opcode, Register Reg, const MachineRegisterInfo &MRI)
See if Reg is defined by an single def instruction that is Opcode.
Definition Utils.cpp:651
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:649
LLVM_ABI bool constrainSelectedInstRegOperands(MachineInstr &I, const TargetInstrInfo &TII, const TargetRegisterInfo &TRI, const RegisterBankInfo &RBI)
Mutate the newly-selected instruction I to constrain its (possibly generic) virtual register operands...
Definition Utils.cpp:155
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
LLVM_ABI std::optional< int64_t > getIConstantVRegSExtVal(Register VReg, const MachineRegisterInfo &MRI)
If VReg is defined by a G_CONSTANT fits in int64_t returns it.
Definition Utils.cpp:314
static const MachineMemOperand::Flags MONoClobber
Mark the MMO of a uniform load if there are no potentially clobbering stores on any path from the sta...
Definition SIInstrInfo.h:44
auto reverse(ContainerTy &&C)
Definition STLExtras.h:408
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:548
@ Add
Sum of integers.
DWARFExpression::Operation Op
void call_once(once_flag &flag, Function &&F, Args &&... ArgList)
Execute the function specified as a parameter once.
Definition Threading.h:86
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:565
LLVM_ABI std::optional< ValueAndVReg > getIConstantVRegValWithLookThrough(Register VReg, const MachineRegisterInfo &MRI, bool LookThroughInstrs=true)
If VReg is defined by a statically evaluable chain of instructions rooted on a G_CONSTANT returns its...
Definition Utils.cpp:433
Align assumeAligned(uint64_t Value)
Treats the value 0 as a 1, so Align is always at least 1.
Definition Alignment.h:111
unsigned Log2(Align A)
Returns the log2 of the alignment.
Definition Alignment.h:208
LLVM_ABI Register getSrcRegIgnoringCopies(Register Reg, const MachineRegisterInfo &MRI)
Find the source register for Reg, folding away any trivial copies.
Definition Utils.cpp:499
constexpr T maskTrailingOnes(unsigned N)
Create a bitmask with the N right-most bits set to 1, and all other bits set to 0.
Definition MathExtras.h:86
@ Default
The result values are uniform if and only if all operands are uniform.
Definition Uniformity.h:20
#define N
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:85
This class contains a discriminated union of information about pointers in memory operands,...
unsigned StartIdx
Number of bits at which this partial mapping starts in the original value.
const RegisterBank * RegBank
Register bank where the partial value lives.
unsigned Length
Length of this mapping in bits.
Helper struct that represents how a value is mapped through different register banks.
unsigned NumBreakDowns
Number of partial mapping to break down this value.
const PartialMapping * BreakDown
How the value is broken down between the different register banks.
The llvm::once_flag structure.
Definition Threading.h:67