0% found this document useful (0 votes)

630 views20 pages

Instruction Scheduler in LLVM

The document discusses instruction scheduling in LLVM. It describes: - The scheduler in LLVM which uses a scheduler DAG to reorder instructions to decrease running time by hiding latency of operations. - Pipeline modeling for targets which uses target descriptions to associate scheduling categories and pipeline information to instructions to model the target architecture and processor pipelines. - Scheduling categories are defined for operands and associated with instructions to model read/write dependencies and processor resources and latency for the scheduler.

Uploaded by

lei li

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

630 views20 pages

Instruction Scheduler in LLVM

Uploaded by

lei li

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Instruction Scheduling in

LLVM
Hsiangkai Wang
[email protected]
Andes Technology
Agenda
Introduction to Instruction Scheduling

Scheduler in LLVM

Pipeline Modeling

Scheduler Customization
Instruction Scheduling

Diﬀerent operations take diﬀerent lengths of time.

Instruction scheduling is the process reordering

the operations in an attempt to decrease its
running time.
a( 1): load $x5_32, $x8_32, @a load, store: 3
b( 4): add $x5_32, $x5_32, $x5_32 add: 1
c( 5): load $x6_32, $x8_32, @b mul: 2
d( 8): mul $x5_32, $x5_32, $x6_32
e(10): load $x7_32, $x8_32, @c
f(13): mul $x5_32, $x5_32, $x7_32 a
g(15): load $x9_32, $x8_32, @d
h(18): mul $x5_32, $x5_32, $x9_32 x5_32
i(20): store $x5_32, $x8_32, @a
b c
x5_32 x6_32

d e
x5_32 x7_32

f g
x5_32 x9_32
h
x5_32

i
a( 1): load $x5_32, $x8_32, @a load, store: 3
b( 4): add $x5_32, $x5_32, $x5_32 add: 1
c( 5): load $x6_32, $x8_32, @b mul: 2
d( 8): mul $x5_32, $x5_32, $x6_32
e(10): load $x7_32, $x8_32, @c
f(13): mul $x5_32, $x5_32, $x7_32 a 13
g(15): load $x9_32, $x8_32, @d
h(18): mul $x5_32, $x5_32, $x9_32 x5_32
i(20): store $x5_32, $x8_32, @a
b 10 c 12

a c e b d g f h i x5_32 x6_32

d 9 e 10
x5_32 x7_32

f 7 g 8
x5_32 x9_32
h 5
x5_32

i 3
a( 1): load $x5_32, $x8_32, @a load, store: 3
b( 4): add $x5_32, $x5_32, $x5_32 add: 1
c( 5): load $x6_32, $x8_32, @b mul: 2
d( 8): mul $x5_32, $x5_32, $x6_32
e(10): load $x7_32, $x8_32, @c
f(13): mul $x5_32, $x5_32, $x7_32 a 13
g(15): load $x9_32, $x8_32, @d
h(18): mul $x5_32, $x5_32, $x9_32 x5_32
i(20): store $x5_32, $x8_32, @a
b 10 c 12

a c e b d g f h i x5_32 x6_32

d 9 e 10
x5_32 x7_32
a( 1): load $x5_32, $x8_32, @a
c( 2): load $x6_32, $x8_32, @b f 7 g 8
e( 3): load $x7_32, $x8_32, @c
x5_32
b( 4): add $x5_32, $x5_32, $x5_32 x9_32
d( 5): mul $x5_32, $x5_32, $x6_32 h 5
g( 6): load $x9_32, $x8_32, @d
f( 7): mul $x5_32, $x5_32, $x7_32 x5_32
h( 9): mul $x5_32, $x5_32, $x9_32
i(11): store $x5_32, $x8_32, @a i 3
Scheduler in LLVM SchedulerDAG
(unit latency) (2008)(Itineraries) -scheditins
(Itineraries) (2012)(SchedModel) -schedmodel

(2008) (2008)
ScheduleDAGSDNodes ScheduleDAGInstrs

(2008)
(2012) SchedulePostRA

ScheduleDAGMI
TDList

-pre-RA-sched=<value>
=fast
(2013)
=linearize
=list-burr LiveInterval ScheduleDAGMILive
=source RegPressure
=list-hybrid
=list-ilp
=vliw-td

Instruction
Register
Selector
MI MI
Allocator
(DAG)

-post-RA-scheduler
(2013) (2008)
SelectionDAGISel MachineScheduler PostRAScheduler
(SchedulePostRATDList)
ScheduleDAGSDNodes::Run (ScheduleDAGMILive)
-enable-misched (2013) PostMachine
(ScheduleDAGMI)
Scheduler
-enable-post-misched
TargetPassConfig::substitutePass(&PostRASchedulerID, &PostMachineSchedulerID)
Data Dependency Graph
a x10_32: data

a: $x10_32 = LUI @Arr x10_32: output

b: $x8_32 = CLI 10
c: SW $x8_32, $x10_32, @Arr b x8_32: data
d: $x10_32 = ADDI $x8_32, 0
e: Call @foo, implicit $x10_32 (ExitSU) c
x8_32: data
x10_32: anti

x10_32: artificial

e
Data Dependency Graph

a
a: SW $x10_32, $x27_32, 12 x10_32: anti
MayAliasMem
b: $x10_32 = LW $x9_32, 0
c: SW $x10_32, $x27_32, 0 b
d: $x10_32 = LW $x8_32, @Glob
... x10_32: output
MayAliasMem x10_32: data

MayAliasMem x10_32: anti

d
Pipeline Modeling for Target
Use target description to describe the pipeline model.

For architecture

Create scheduling categories for operands.

<Target>Schedule.td

Associate scheduling categories to instructions.

<Target>InstrInfo.td

For processor

Associate pipeline information to scheduling categories.

<Target>Schedule<Processor>.td
Associate per-operand SchedReadWrite types with Instructions by
modifying the Instruction definition to inherit from Sched.

Sched
+SchedRW
SchedReadWrite
SchedRW lists the per-operand types that
map to the machine model of an instruction. Associate with instructions

Define a scheduler Define a scheduler

resource associated with a resource associated with a
use operand. def operand.

SchedRead SchedWrite
Associate with target
Associate with subtargets

ProcReadAdvance ProcWriteResources
+Cycles
+ProcResources

+ValidWrites +ResourceCycles

+Latency

For use with InstRW or For use with InstRW or

ItinRW. ItinRW.

SchedReadAdvance SchedWriteRes
ReadAdvance WriteRes
+ReadType +WriteType

Define WriteRes and ReadAdvance to

InstRW: Map a set of opcodes to a list of SchedReadWrite types. associate processor resources and latency
This allow the sub target to easily override specific operations. with each SchedReadWrite type.

ItinRW: Map a set of itinerary classes to SchedReadWrite resources. inherentence

aggregate
Create scheduling categories
def ALUOut : SchedWrite; // For define operands of ALU op
def ALUIn : SchedRead; // For use operands of ALU op
def MULOut : SchedWrite; // For define operands of MUL op
def MULIn : SchedRead; // For use operands of MUL op

Associate instructions with scheduling categories

class ALU_ri<bits<3> funct3, string opcodestr>
: RVInstI<funct3, OPC_OP_IMM,
(outs GPR:$rd),
(ins GPR:$rs1, simm12:$imm12),
opcodestr, "$rd, $rs1, $imm12”>,
Sched<[ALUOut, ALUIn]>;
Associate pipeline information to scheduling categories
def UnitALU : ProcResource<1> { let BufferSize = 0; }
def UnitMDU : ProcResource<1> { let BufferSize = 0; }

def : WriteRes<ALUOut, [UnitALU]> { let Latency = 2; }

def : WriteRes<MULOut, [UnitMDU]> { let Latency = 4; }
def : ReadAdvance<ALUIn, 1>;
def : ReadAdvance<MULIn, 1>;

ISSUE ALU MDU WB

ALU
MDU
def : WriteRes<ALUOut, [UnitALU]> { let Latency = 2; }
def : WriteRes<MULOut, [UnitMDU]> { let Latency = 4; }
def : ReadAdvance<ALUIn, 1>;
def : ReadAdvance<MULIn, 1>;

Latency =
a: MUL r3, r3, r2 a MUL’s Latency - ADD’s Advance
b: ADD r4, r3, r2 r3: data
3
b
time ISSUE ALU MDU WB

t0 MUL
t1 ADD MUL
t2 stall MUL
t3 stall MUL
t4 ADD stall
GenericScheduler::tryCandidate
Physical register copies

Register pressure (Excess, CriticalMax)

Acyclic Latency

Cluster

Weak edges

Register pressure (CurrentMax)

Resource

Latency

Source order
Customize Scheduler for Target
Define your scheduling policy.

Define your scheduling strategy.

Add DAG mutations.

Implement overrideSchedPolicy

struct MachineSchedPolicy {
bool ShouldTrackPressure = false;
bool ShouldTrackLaneMasks = false;
bool OnlyTopDown = false;
bool OnlyBottomUp = false;
bool DisableLatencyHeuristic = false;
};

void
<Target>Subtarget::overrideSchedPolicy(MachineSchedPolicy &Policy,
unsigned NumRegionInstrs) const {
Policy.OnlyTopDown = false;
Policy.OnlyBottomUp = false;
Policy.ShouldTrackPressure = true;
}
Derive MachineSchedStrategy

MachineSchedStrategy
class YourStrategy : public GenericScheduler {
...
SUnit *pickNode(bool &IsTopNode) override {
// ...
// Your heuristic algorithm.
GenericSchedulerBase
//

return GenericScheduler::pickNode(IsTopNode); 
} 
};
GenericScheduler
DAG mutations
// Implement your mutation.
class CustomMutation : public ScheduleDAGMutation {
public:
void apply(ScheduleDAGInstrs *DAGInstrs) override;
};

std::unique_ptr<SchuduleDAGMutation>
llvm::createCustomMutation(const <Target>Subtarget *STI) {
return llvm::make_unique<CustomMutation>(STI); 
}

// Install
ScheduleDAGInstr *
createMachineScheduler(MachineSchedContext *C) const override {
const <Target>Subtarget &STI = C->MF->getSubtarget<<Target>Subtarget>();
ScheduleDAGMILive *DAG = createGenericSchedLive(C);
DAG->addMutation(createCustomMutation(STI));
return DAG; 
}
a a

b c b c

d d
Reference
Engineering a Compiler, 2nd Edition

2017 LLVM Developers’ Meeting: “Writing Great

Machine Schedulers”

LLVM Code Generation A Deep Dive Into Compiler Backend - Quentin Colombet - 1st, 2025 - Packt Publishing Limited - Anna's Archive
100% (1)
LLVM Code Generation A Deep Dive Into Compiler Backend - Quentin Colombet - 1st, 2025 - Packt Publishing Limited - Anna's Archive
608 pages
LFD 320 - Linux Kernel Internals and Debugging
No ratings yet
LFD 320 - Linux Kernel Internals and Debugging
304 pages
LLVM Essentials - Sample Chapter
No ratings yet
LLVM Essentials - Sample Chapter
16 pages
Jeffrey Dean CSE Summa Sum1990
No ratings yet
Jeffrey Dean CSE Summa Sum1990
34 pages
MLIR Tutorial
No ratings yet
MLIR Tutorial
78 pages
Tutorial LLV M Back End Cpu 0
No ratings yet
Tutorial LLV M Back End Cpu 0
605 pages
Tutorial: Starting ETAP
No ratings yet
Tutorial: Starting ETAP
20 pages
LLVM
No ratings yet
LLVM
474 pages
Rethinking Code Generation
No ratings yet
Rethinking Code Generation
21 pages
This Unit: Superscalar Execution: - Idea of Instruction-Level Parallelism - Superscalar Scaling Issues
No ratings yet
This Unit: Superscalar Execution: - Idea of Instruction-Level Parallelism - Superscalar Scaling Issues
13 pages
Object Files in LLVM
No ratings yet
Object Files in LLVM
19 pages
OpenCL Best Practices Guide
No ratings yet
OpenCL Best Practices Guide
54 pages
SystemC Language Guide & Concepts
No ratings yet
SystemC Language Guide & Concepts
23 pages
Write An LLVMBackend Tutorial For Cpu 0
No ratings yet
Write An LLVMBackend Tutorial For Cpu 0
189 pages
Integrating FV Into Your Verification Flow: Steve Holloway
No ratings yet
Integrating FV Into Your Verification Flow: Steve Holloway
16 pages
The Good, The Bad and The Ugly: On Threads, Processes and Coprocesses
100% (1)
The Good, The Bad and The Ugly: On Threads, Processes and Coprocesses
35 pages
C++ Tutorial Part II - Advanced: Silan Liu
No ratings yet
C++ Tutorial Part II - Advanced: Silan Liu
53 pages
Computer Organisation & Architecture Project Report Topic:: Course Code: CO206
No ratings yet
Computer Organisation & Architecture Project Report Topic:: Course Code: CO206
18 pages
The Java Thread Model
100% (3)
The Java Thread Model
6 pages
System Programming With Linux Debugging Using C and C++ Programming Topics
No ratings yet
System Programming With Linux Debugging Using C and C++ Programming Topics
19 pages
Manual Dwyer MS-111 Pressure Transmitter
No ratings yet
Manual Dwyer MS-111 Pressure Transmitter
4 pages
3512+1275+Kva+Prime+Low+Bsfc Emcp4
No ratings yet
3512+1275+Kva+Prime+Low+Bsfc Emcp4
6 pages
Advanced Computer Architecture: CSE-401 E
No ratings yet
Advanced Computer Architecture: CSE-401 E
71 pages
LATHESHAAS
100% (1)
LATHESHAAS
340 pages
Alspa
No ratings yet
Alspa
344 pages
U-Boot Porting Guide April 2007
No ratings yet
U-Boot Porting Guide April 2007
7 pages
NetScaler 10.5 Clustering
No ratings yet
NetScaler 10.5 Clustering
143 pages
Efs Brochure 11-2010
No ratings yet
Efs Brochure 11-2010
36 pages
9107 PDF
100% (3)
9107 PDF
521 pages
Design and Implementation of 64 Bit Alu Using VHDL
No ratings yet
Design and Implementation of 64 Bit Alu Using VHDL
59 pages
Pel Internship Report
No ratings yet
Pel Internship Report
210 pages
Ideal 7228 95 Operating Instructions
No ratings yet
Ideal 7228 95 Operating Instructions
24 pages
Keerthi - C++ Resume
No ratings yet
Keerthi - C++ Resume
6 pages
Introduction To Computer Architecture and Performance Measurement
No ratings yet
Introduction To Computer Architecture and Performance Measurement
41 pages
Instruction-Level Parallelism (ILP), Since The
100% (1)
Instruction-Level Parallelism (ILP), Since The
57 pages
All Modules - NFS2-3030V26.1 - Notifier VeriFire Tools 10.55 Build 19 - 08052020 - 111311
No ratings yet
All Modules - NFS2-3030V26.1 - Notifier VeriFire Tools 10.55 Build 19 - 08052020 - 111311
10 pages
PCIe Drivers Learning Path for Beginners
No ratings yet
PCIe Drivers Learning Path for Beginners
38 pages
Talk Gatech DSP Compilation 2000
No ratings yet
Talk Gatech DSP Compilation 2000
29 pages
E-Waste Management Report
100% (1)
E-Waste Management Report
19 pages
2019 DVCon India Modern SystemC.v2 - 4.3
No ratings yet
2019 DVCon India Modern SystemC.v2 - 4.3
41 pages
ABS Evolution and Functionality
No ratings yet
ABS Evolution and Functionality
10 pages
Driver
No ratings yet
Driver
47 pages
Advanced Ceramics Lecture
No ratings yet
Advanced Ceramics Lecture
15 pages
Part 4 Rules For Classification of Sea-Going Ships
No ratings yet
Part 4 Rules For Classification of Sea-Going Ships
123 pages
Mastering The DMA and IOMMU Apis: Embedded Linux Conference 2014 San Jose
No ratings yet
Mastering The DMA and IOMMU Apis: Embedded Linux Conference 2014 San Jose
102 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
No ratings yet
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
73 pages
ECE 521: Microprocessor System: CHAPTER 3: Microprocessor Programming in C
No ratings yet
ECE 521: Microprocessor System: CHAPTER 3: Microprocessor Programming in C
42 pages
Developed by University of Illinois at Urbana-Champaign CIS Dept Cisc 471 Matthew Warner
No ratings yet
Developed by University of Illinois at Urbana-Champaign CIS Dept Cisc 471 Matthew Warner
9 pages
BSP Porting Guide L3.0.35 1.1.0
No ratings yet
BSP Porting Guide L3.0.35 1.1.0
63 pages
Underwater Drone for Fish Recognition
No ratings yet
Underwater Drone for Fish Recognition
7 pages
HighPerformanceComputing DS
No ratings yet
HighPerformanceComputing DS
2 pages
11 KV Application
No ratings yet
11 KV Application
5 pages
Unit 5 (Slides)
No ratings yet
Unit 5 (Slides)
75 pages
Exploitation ARM and Xtensa Compared
No ratings yet
Exploitation ARM and Xtensa Compared
45 pages
Adder Notes Merged
No ratings yet
Adder Notes Merged
63 pages
L 1 ParallelProcess Challenges
No ratings yet
L 1 ParallelProcess Challenges
82 pages
RCF 2
No ratings yet
RCF 2
1 page
Lab Assignment 2: MIPS Single-Cycle Implementation: Electrical and Computer Engineering University of Cyprus
100% (1)
Lab Assignment 2: MIPS Single-Cycle Implementation: Electrical and Computer Engineering University of Cyprus
23 pages
TDC Driver
No ratings yet
TDC Driver
2 pages
OpenWrt SDK
No ratings yet
OpenWrt SDK
10 pages
Linux SPI Device Driver Guide
No ratings yet
Linux SPI Device Driver Guide
22 pages
T510PK PowerSupply
No ratings yet
T510PK PowerSupply
1 page
Implementation of Ahb Protocol Using Fpga PDF
No ratings yet
Implementation of Ahb Protocol Using Fpga PDF
12 pages
Axi Slave
No ratings yet
Axi Slave
9 pages
CortexM4 FPU
No ratings yet
CortexM4 FPU
14 pages
CSE B.Tech: Instruction Set Basics
No ratings yet
CSE B.Tech: Instruction Set Basics
11 pages
Saep 150
100% (1)
Saep 150
13 pages
AMD64 Architecture Programmers Manual
No ratings yet
AMD64 Architecture Programmers Manual
386 pages
Priority Assignment in Waiting Line Problems
No ratings yet
Priority Assignment in Waiting Line Problems
8 pages
Notes - On - Linux Kernel
No ratings yet
Notes - On - Linux Kernel
2 pages
Basics of The I2C Communication Protocol
No ratings yet
Basics of The I2C Communication Protocol
26 pages
Electrical Engineering Resume
No ratings yet
Electrical Engineering Resume
3 pages
LLVM 3.4 Instruction Guide
No ratings yet
LLVM 3.4 Instruction Guide
2 pages
Lesson 20
No ratings yet
Lesson 20
20 pages
Virtual Sequence & Sequencer Guide
No ratings yet
Virtual Sequence & Sequencer Guide
12 pages
The X Window System
No ratings yet
The X Window System
27 pages
Circuit Analysis for Engineering Students
No ratings yet
Circuit Analysis for Engineering Students
71 pages
SystemVerilog Topics PPT (Autosaved)
No ratings yet
SystemVerilog Topics PPT (Autosaved)
103 pages
Device Driver2
No ratings yet
Device Driver2
35 pages
Saba Magazine Issue Jan 2012
No ratings yet
Saba Magazine Issue Jan 2012
49 pages
Multithreading C++
No ratings yet
Multithreading C++
9 pages
Achieva 1.5T Pulsar SE
No ratings yet
Achieva 1.5T Pulsar SE
3 pages
Ps-I Unit-3
No ratings yet
Ps-I Unit-3
34 pages
Computer Architecture Lab Solution (MAKAUT)
No ratings yet
Computer Architecture Lab Solution (MAKAUT)
34 pages
Chapter+10+ +Embedded+Linux+ +Kernel+and+Device+Driver+Development+ +part+7
No ratings yet
Chapter+10+ +Embedded+Linux+ +Kernel+and+Device+Driver+Development+ +part+7
70 pages
Linux Kernel Device Driver
No ratings yet
Linux Kernel Device Driver
4 pages
Nail or Catalog Electric Duc Theaters Combined
No ratings yet
Nail or Catalog Electric Duc Theaters Combined
36 pages
Digital Logic Gate
No ratings yet
Digital Logic Gate
34 pages
DC Drives Reactive Power
No ratings yet
DC Drives Reactive Power
29 pages

Instruction Scheduler in LLVM

Uploaded by

Instruction Scheduler in LLVM

Uploaded by

Instruction Scheduling in

Diﬀerent operations take diﬀerent lengths of time.

Instruction scheduling is the process reordering

a: $x10_32 = LUI @Arr x10_32: output

MayAliasMem x10_32: anti

Create scheduling categories for operands.

Associate scheduling categories to instructions.

Associate pipeline information to scheduling categories.

Define a scheduler Define a scheduler

For use with InstRW or For use with InstRW or

Define WriteRes and ReadAdvance to

ItinRW: Map a set of itinerary classes to SchedReadWrite resources. inherentence

Associate instructions with scheduling categories

def : WriteRes<ALUOut, [UnitALU]> { let Latency = 2; }

ISSUE ALU MDU WB

Register pressure (Excess, CriticalMax)

Register pressure (CurrentMax)

Define your scheduling strategy.

Add DAG mutations.

2017 LLVM Developers’ Meeting: “Writing Great

You might also like