Instruction Scheduling in
LLVM
Hsiangkai Wang
[email protected]
Andes Technology
Agenda
Introduction to Instruction Scheduling
Scheduler in LLVM
Pipeline Modeling
Scheduler Customization
Instruction Scheduling
Different operations take different lengths of time.
Instruction scheduling is the process reordering
the operations in an attempt to decrease its
running time.
a( 1): load $x5_32, $x8_32, @a load, store: 3
b( 4): add $x5_32, $x5_32, $x5_32 add: 1
c( 5): load $x6_32, $x8_32, @b mul: 2
d( 8): mul $x5_32, $x5_32, $x6_32
e(10): load $x7_32, $x8_32, @c
f(13): mul $x5_32, $x5_32, $x7_32 a
g(15): load $x9_32, $x8_32, @d
h(18): mul $x5_32, $x5_32, $x9_32 x5_32
i(20): store $x5_32, $x8_32, @a
b c
x5_32 x6_32
d e
x5_32 x7_32
f g
x5_32 x9_32
h
x5_32
i
a( 1): load $x5_32, $x8_32, @a load, store: 3
b( 4): add $x5_32, $x5_32, $x5_32 add: 1
c( 5): load $x6_32, $x8_32, @b mul: 2
d( 8): mul $x5_32, $x5_32, $x6_32
e(10): load $x7_32, $x8_32, @c
f(13): mul $x5_32, $x5_32, $x7_32 a 13
g(15): load $x9_32, $x8_32, @d
h(18): mul $x5_32, $x5_32, $x9_32 x5_32
i(20): store $x5_32, $x8_32, @a
b 10 c 12
a c e b d g f h i x5_32 x6_32
d 9 e 10
x5_32 x7_32
f 7 g 8
x5_32 x9_32
h 5
x5_32
i 3
a( 1): load $x5_32, $x8_32, @a load, store: 3
b( 4): add $x5_32, $x5_32, $x5_32 add: 1
c( 5): load $x6_32, $x8_32, @b mul: 2
d( 8): mul $x5_32, $x5_32, $x6_32
e(10): load $x7_32, $x8_32, @c
f(13): mul $x5_32, $x5_32, $x7_32 a 13
g(15): load $x9_32, $x8_32, @d
h(18): mul $x5_32, $x5_32, $x9_32 x5_32
i(20): store $x5_32, $x8_32, @a
b 10 c 12
a c e b d g f h i x5_32 x6_32
d 9 e 10
x5_32 x7_32
a( 1): load $x5_32, $x8_32, @a
c( 2): load $x6_32, $x8_32, @b f 7 g 8
e( 3): load $x7_32, $x8_32, @c
x5_32
b( 4): add $x5_32, $x5_32, $x5_32 x9_32
d( 5): mul $x5_32, $x5_32, $x6_32 h 5
g( 6): load $x9_32, $x8_32, @d
f( 7): mul $x5_32, $x5_32, $x7_32 x5_32
h( 9): mul $x5_32, $x5_32, $x9_32
i(11): store $x5_32, $x8_32, @a i 3
Scheduler in LLVM SchedulerDAG
(unit latency) (2008)(Itineraries) -scheditins
(Itineraries) (2012)(SchedModel) -schedmodel
(2008) (2008)
ScheduleDAGSDNodes ScheduleDAGInstrs
(2008)
(2012) SchedulePostRA
ScheduleDAGMI
TDList
-pre-RA-sched=<value>
=fast
(2013)
=linearize
=list-burr LiveInterval ScheduleDAGMILive
=source RegPressure
=list-hybrid
=list-ilp
=vliw-td
Instruction
Register
Selector
MI MI
Allocator
(DAG)
-post-RA-scheduler
(2013) (2008)
SelectionDAGISel MachineScheduler PostRAScheduler
(SchedulePostRATDList)
ScheduleDAGSDNodes::Run (ScheduleDAGMILive)
-enable-misched (2013) PostMachine
(ScheduleDAGMI)
Scheduler
-enable-post-misched
TargetPassConfig::substitutePass(&PostRASchedulerID, &PostMachineSchedulerID)
Data Dependency Graph
a x10_32: data
a: $x10_32 = LUI @Arr x10_32: output
b: $x8_32 = CLI 10
c: SW $x8_32, $x10_32, @Arr b x8_32: data
d: $x10_32 = ADDI $x8_32, 0
e: Call @foo, implicit $x10_32 (ExitSU) c
x8_32: data
x10_32: anti
x10_32: artificial
e
Data Dependency Graph
a
a: SW $x10_32, $x27_32, 12 x10_32: anti
MayAliasMem
b: $x10_32 = LW $x9_32, 0
c: SW $x10_32, $x27_32, 0 b
d: $x10_32 = LW $x8_32, @Glob
... x10_32: output
MayAliasMem x10_32: data
MayAliasMem x10_32: anti
d
Pipeline Modeling for Target
Use target description to describe the pipeline model.
For architecture
Create scheduling categories for operands.
<Target>Schedule.td
Associate scheduling categories to instructions.
<Target>InstrInfo.td
For processor
Associate pipeline information to scheduling categories.
<Target>Schedule<Processor>.td
Associate per-operand SchedReadWrite types with Instructions by
modifying the Instruction definition to inherit from Sched.
Sched
+SchedRW
SchedReadWrite
SchedRW lists the per-operand types that
map to the machine model of an instruction. Associate with instructions
Define a scheduler Define a scheduler
resource associated with a resource associated with a
use operand. def operand.
SchedRead SchedWrite
Associate with target
Associate with subtargets
ProcReadAdvance ProcWriteResources
+Cycles
+ProcResources
+ValidWrites +ResourceCycles
+Latency
For use with InstRW or For use with InstRW or
ItinRW. ItinRW.
SchedReadAdvance SchedWriteRes
ReadAdvance WriteRes
+ReadType +WriteType
Define WriteRes and ReadAdvance to
InstRW: Map a set of opcodes to a list of SchedReadWrite types. associate processor resources and latency
This allow the sub target to easily override specific operations. with each SchedReadWrite type.
ItinRW: Map a set of itinerary classes to SchedReadWrite resources. inherentence
aggregate
Create scheduling categories
def ALUOut : SchedWrite; // For define operands of ALU op
def ALUIn : SchedRead; // For use operands of ALU op
def MULOut : SchedWrite; // For define operands of MUL op
def MULIn : SchedRead; // For use operands of MUL op
Associate instructions with scheduling categories
class ALU_ri<bits<3> funct3, string opcodestr>
: RVInstI<funct3, OPC_OP_IMM,
(outs GPR:$rd),
(ins GPR:$rs1, simm12:$imm12),
opcodestr, "$rd, $rs1, $imm12”>,
Sched<[ALUOut, ALUIn]>;
Associate pipeline information to scheduling categories
def UnitALU : ProcResource<1> { let BufferSize = 0; }
def UnitMDU : ProcResource<1> { let BufferSize = 0; }
def : WriteRes<ALUOut, [UnitALU]> { let Latency = 2; }
def : WriteRes<MULOut, [UnitMDU]> { let Latency = 4; }
def : ReadAdvance<ALUIn, 1>;
def : ReadAdvance<MULIn, 1>;
ISSUE ALU MDU WB
ALU
MDU
def : WriteRes<ALUOut, [UnitALU]> { let Latency = 2; }
def : WriteRes<MULOut, [UnitMDU]> { let Latency = 4; }
def : ReadAdvance<ALUIn, 1>;
def : ReadAdvance<MULIn, 1>;
Latency =
a: MUL r3, r3, r2 a MUL’s Latency - ADD’s Advance
b: ADD r4, r3, r2 r3: data
3
b
time ISSUE ALU MDU WB
t0 MUL
t1 ADD MUL
t2 stall MUL
t3 stall MUL
t4 ADD stall
GenericScheduler::tryCandidate
Physical register copies
Register pressure (Excess, CriticalMax)
Acyclic Latency
Cluster
Weak edges
Register pressure (CurrentMax)
Resource
Latency
Source order
Customize Scheduler for Target
Define your scheduling policy.
Define your scheduling strategy.
Add DAG mutations.
Implement overrideSchedPolicy
struct MachineSchedPolicy {
bool ShouldTrackPressure = false;
bool ShouldTrackLaneMasks = false;
bool OnlyTopDown = false;
bool OnlyBottomUp = false;
bool DisableLatencyHeuristic = false;
};
void
<Target>Subtarget::overrideSchedPolicy(MachineSchedPolicy &Policy,
unsigned NumRegionInstrs) const {
Policy.OnlyTopDown = false;
Policy.OnlyBottomUp = false;
Policy.ShouldTrackPressure = true;
}
Derive MachineSchedStrategy
MachineSchedStrategy
class YourStrategy : public GenericScheduler {
...
SUnit *pickNode(bool &IsTopNode) override {
// ...
// Your heuristic algorithm.
GenericSchedulerBase
//
return GenericScheduler::pickNode(IsTopNode);
}
};
GenericScheduler
DAG mutations
// Implement your mutation.
class CustomMutation : public ScheduleDAGMutation {
public:
void apply(ScheduleDAGInstrs *DAGInstrs) override;
};
std::unique_ptr<SchuduleDAGMutation>
llvm::createCustomMutation(const <Target>Subtarget *STI) {
return llvm::make_unique<CustomMutation>(STI);
}
// Install
ScheduleDAGInstr *
createMachineScheduler(MachineSchedContext *C) const override {
const <Target>Subtarget &STI = C->MF->getSubtarget<<Target>Subtarget>();
ScheduleDAGMILive *DAG = createGenericSchedLive(C);
DAG->addMutation(createCustomMutation(STI));
return DAG;
}
a a
b c b c
d d
Reference
Engineering a Compiler, 2nd Edition
2017 LLVM Developers’ Meeting: “Writing Great
Machine Schedulers”