0% found this document useful (0 votes)

60 views26 pages

Advanced CPU Speculation Techniques

The document discusses hardware-based speculation and techniques to exploit instruction-level parallelism (ILP) in modern processors. It describes how processors can speculatively execute instructions past branches based on predictions and how the reorder buffer allows restoring state and communicating results if predictions are wrong. The reorder buffer entries hold instruction information, source operands, and results, allowing speculative execution and correctly committing changes on correct predictions. Renaming registers extend the register file to hold speculative results until commit. While assumptions like perfect prediction, unlimited resources, and 1-cycle latency allow ideal ILP, real machines face limits to parallelism from dependencies, memory aliasing, and other factors.

Uploaded by

Padmasri Durai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views26 pages

Advanced CPU Speculation Techniques

Uploaded by

Padmasri Durai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Multiple Instruction Issue

and
Hardware Based Speculation

Soner Önder
Michigan Technological University, Houghton MI
www.cs.mtu.edu/~soner
Hardware Based Speculation 2

•Exploiting more ILP requires that we overcome the

limitation of control dependence:
 With branch prediction we allowed the processor continue
issuing instructions past a branch based on a prediction:
 Those fetched instructions do not modify the processor state.
 These instructions are squashed if prediction is incorrect.
 We now allow the processor to execute these instructions
before we know if it is ok to execute them:
 We need to correctly restore the processor state if such an
instruction should not have been executed.
 We need to pass the results from these instructions to future
instructions as if the program is just following that path.
Hardware Based Speculation 3

B1 x < y? •Assume the processor

predicts B1 to be taken and
N T executes.
A =b+c C=0
C=c-1 A=0 •What will happen if the
prediction was wrong?
X<z B •What value of each variable
N T
2 should be used if the
B=b+1 C=a processor predicts B1 and B2
A=a+1 taken and executes
instructions along the way?
D=a+b+c
….
Use d
Hardware Based Speculation 4

•In order to execute instructions speculatively, we

need to provide means:
 To roll back the values of both registers and the memory to their
correct values upon a misprediction,
 To communicate speculatively calculated values to the new uses
of those values.

•Both can be provided by using a simple structure

called Reorder Buffer (ROB).
Reorder Buffer 5

•It is a simple circular array with a head and a tail

pointer:
 New instructions is allocated a position at the tail in program
order.
 Each entry provides a location for storing the instruction’s result.
 New instructions look for the values starting from tail – back.
 When the instruction at the head complete and becomes non-
speculative the values are committed and the instruction is
removed from the buffer.

Tail Head
Reorder Buffer 6

 3 fields: instr, destination, value

 Reorder buffer can be operand source => more registers like
RS
 Use reorder buffer number instead of reservation station
when execution completes
 Supplies operands between execution complete & commit
 Once operand commits, result is put into register
 Instructions commit
 As a result, its easy to undo speculated instructions
on mispredicted branches
or on exceptions
Steps of Speculative Tomasulo Algorithm
7

1. Issue [get instruction from FP Op Queue]

• Check if the reorder buffer is full.

• Check if a reservation station is available.
• Access the register file and the reorder buffer for the current
values of the source operands.
• Send the instruction, its reorder buffer slot number and the
source operands to the reservation station.

Once issued, the instruction stays in the reservation

station until it gets both operands.
Steps of Speculative Tomasulo Algorithm
8

2. Execute [operate on operands (EX)]

When both operands ready and a functional
unit is available, the instruction executes.
This step checks RAW hazards and as long as
operands are not ready, watches CDB for results.
Steps of Speculative Tomasulo Algorithm
9

3. Write result [finish execution (WB)]

Write on Common Data Bus to all awaiting FUs and
the reorder buffer; mark reservation station available.
Steps of Speculative Tomasulo Algorithm
10

4. Commit [update register file with reorder result]

 When instruction reaches the head of reorder buffer
 The result is present
 No exceptions associated with the instruction:

The instruction becomes non-speculative:

 Update register file with result (or store to memory)
 Remove the instruction from the reorder buffer.

A mispredicted branch flushes the reorder buffer.

MIPS FP Unit 11
Renaming Registers 12

Common variation of speculative design

Reorder buffer keeps instruction information
but not the result
Extend register file with extra
renaming registers to hold speculative results
Rename register allocated at issue;
result into rename register on execution complete;
rename register into real register on commit
Operands read either from register file
(real or speculative) or via Common Data Bus
Advantage: operands are always from single source
(extended register file)
Renaming Registers 13

1. Index a MAP table using the

0
source register identifiers to 1
get the physical register 2
125 Map table
number. .
.
2. Get the previous physical 29
register number for the 30
destination register. 31

3. Allocate a free physical

register and modify the MAP
table by indexing it with the 0
1
destination register 2
identifier. .
.
4. When instruction commits,
125
return the previous physical 126 Physical registers
register to the pool. 127
Renaming Registers 14

0 0
1 1 R7=r4+r3
2 2
3 R6=r2+r6
4
3 R3=r6+r7
5 4 R6=r6+10
6 5
7 6
8 7

Map table Code sequence

9
10
22
13
17
Renaming Registers 15

0 0
1 1
2 2 R7=r4+r3
3 3 R6=r2+r6
4 4 R3=r6+r7
5 5 R6=r6+10
6 6
7 7

Map table Code sequence Renamed Code sequence

9
10
22
13
17
Renaming Registers 16

Previous Dest
0 0 R9=r4+r3 R7
1 1
2 2 R7=r4+r3
3 3 R6=r2+r6
4 4 R3=r6+r7
5 5 R6=r6+10
6 6
7 9

Map table Code sequence Renamed Code sequence

10
22
13
17
Renaming Registers 17

Previous Dest
0 0 R9=r4+r3 R7
1 1 R10=r2+r6 r6
2 2 R7=r4+r3
3 3 R6=r2+r6
4 4 R3=r6+r7
5 5 R6=r6+10
6 10
7 9

Map table Code sequence Renamed Code sequence

22
13
17
Renaming Registers 18

Previous Dest
0 0 R9=r4+r3 R7
1 1 R10=r2+r6 R6
2 2 R7=r4+r3 R22=r10+r9 R3
3 22 R6=r2+r6
4 4 R3=r6+r7
5 5 R6=r6+10
6 10
7 9

Map table Code sequence Renamed Code sequence

13
17
Renaming Registers 19

Previous Dest
0 0 R9=r4+r3 R7
1 1 R10=r2+r6 R6
2 2 R7=r4+r3 R22=r10+r9 R3
3 22 R6=r2+r6 R13=r10+10 R10
4 4 R3=r6+r7
5 5 R6=r6+10
6 13
7 9

Map table Code sequence Renamed Code sequence

17
Renaming Registers 20

Previous Dest
0 0 R9=r4+r3 R7
1 1 R10=r2+r6 R6
2 2 R7=r4+r3 R22=r10+r9 R3
3 22 R6=r2+r6 R13=r10+10 R10
4 4 R3=r6+r7
5 5 R6=r6+10
6 13
7 9

Map table Code sequence Renamed Code sequence

17
10 When r13=r10+10
retires
Limits to ILP 21

Assumptions for ideal/perfect machine to start:

1. Register renaming–infinite virtual registers and all
WAW & WAR hazards are avoided
2. Branch prediction–perfect; no mispredictions
3. Jump prediction–all jumps perfectly predicted =>
machine with perfect speculation & an unbounded buffer of
instructions available
4. Memory-address alias analysis–addresses are known &
a load can be moved before a store provided addresses not
equal
1 cycle latency for all instructions; unlimited number of
instructions issued per clock cycle
Upper Limit to ILP: Ideal Machine 22

160 150.1
FP: 75 - 150
140
Inst ruct ion Issues per cycle

120 Integer: 18 - 60 118.7

100

75.2
IPC

80
62.6
54.8
60

40
17.9
20

0
gcc espresso li f pppp doducd t omcat v

Programs
More Realistic HW: Branch Impact
23

Change from Infinite window 61 FP: 15 -6045

to examine to 2000 and
60 58

maximum issue of 64
50 instructions per clock cycle 48
46 45 46 45 45
Inst ruct ion issues per cycle

41
40
35

Integer: 6 - 12
29
30
IPC

19
20 16
15
13 14
12
10
9
10 6 7 6 6 7
6
4
2 2 2

gcc espresso li fpppp doducd tomcatv

Program

Perfect Selective predictor Standard 2-bit Static None

More Realistic HW: Register Impact
24

59
FP: 11 - 45
60
Change 2000 instr 54

window, 64 instr issue, 8K 49

2 level Prediction
50
45
44
Inst ruct ion issues per cycle

40
IPC

30 Integer: 5 - 15 29 28

20
20 16
15 15 15
13
12 12 12 11 11
11 10 10 10
9
10 7
5 6 5 5 5 5
4 5 4 5
4

gcc espresso li fpppp doducd tomcatv

Program

Infinite 256 128 64 32 None

More Realistic HW: Alias Impact
25

49 49
50

Change 2000 instr window,

45 45
45
64 instr issue, 8K 2 level FP: 4 - 45
Prediction, 256 renaming
40
Inst ruct ion issues per cycle

35 registers (Fortran,
30 no heap)
25

20 Integer: 4 - 9
IPC

16 16
15
15
12
10
10 9
7 7
5 5 6
4 4 4 5
3 3 3 4 4
5

gcc espresso li fpppp doducd tomcatv

Program

Perfect Global/stack Perfect Inspection None

Realistic HW for ‘9X: Window Impact
26

60
56
Perfect disambiguation (HW), 1K
Selective Prediction, 16 entry 47
52

50 FP: 8 - 45
return, 64 registers, issue as 45
Inst ruct ion issues per cycle

many as window
40
35
Integer: 6 - 12 34

30
IPC

22 22

20 17 16
15 15 15 14
13 14
12 12 11 11 12
10 10 10 10
9 8 9 8 9 9
10 8
6 6 6 7
5 6
4 4 4 4
3 2 3 3 3 3

gcc expresso li fpppp doducd tomcatv

Program

Infinite 256 128 64 32 16 8 4

Notes - Computer Science I (XI) PDF
100% (1)
Notes - Computer Science I (XI) PDF
146 pages
Hardware Based Speculation
No ratings yet
Hardware Based Speculation
2 pages
MIPS Pipeline & Dynamic Scheduling
No ratings yet
MIPS Pipeline & Dynamic Scheduling
5 pages
FPGA Microprocessor Design Challenge
No ratings yet
FPGA Microprocessor Design Challenge
4 pages
Lecture 5
No ratings yet
Lecture 5
80 pages
Expliting More ILP
No ratings yet
Expliting More ILP
20 pages
CPU Architecture for Students
100% (1)
CPU Architecture for Students
30 pages
Dynamic Approach Hardware Based Speculation
No ratings yet
Dynamic Approach Hardware Based Speculation
26 pages
Dynamic Approach Hardware Based Speculation
No ratings yet
Dynamic Approach Hardware Based Speculation
27 pages
Dynamic Scheduling With Speculative Execution: Lecture 3D
No ratings yet
Dynamic Scheduling With Speculative Execution: Lecture 3D
24 pages
Lecture-14-03 02 2025
No ratings yet
Lecture-14-03 02 2025
53 pages
Instruction Level Parallelism (Part 6) : Microprocessors
No ratings yet
Instruction Level Parallelism (Part 6) : Microprocessors
44 pages
Onur Ddca 2025 Lecture14 Out of Order Execution Afterlecture
No ratings yet
Onur Ddca 2025 Lecture14 Out of Order Execution Afterlecture
114 pages
Dynamic Approach Hardware Based Speculation
No ratings yet
Dynamic Approach Hardware Based Speculation
27 pages
CS252 Graduate Computer Architecture Reorder Buffers and Explicit Register Renaming
No ratings yet
CS252 Graduate Computer Architecture Reorder Buffers and Explicit Register Renaming
55 pages
Lec7 Pipelining
No ratings yet
Lec7 Pipelining
22 pages
Arch4 Pipelined Processor Design Afterlecture
No ratings yet
Arch4 Pipelined Processor Design Afterlecture
130 pages
Computer Architecture Students
No ratings yet
Computer Architecture Students
110 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
Onur Digitaldesign - Comparch 2021 Lecture16 Out of Order Execution Beforelecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture16 Out of Order Execution Beforelecture
89 pages
08 Speculation
No ratings yet
08 Speculation
21 pages
Release Notes Xcode44dp
No ratings yet
Release Notes Xcode44dp
4 pages
5 Advanced-1
No ratings yet
5 Advanced-1
60 pages
RN ACA-5 Unit-II
No ratings yet
RN ACA-5 Unit-II
42 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
36 pages
MIPS Instruction Guide
No ratings yet
MIPS Instruction Guide
43 pages
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
97 pages
Onur 447 Spring15 Lecture11 Precise Exceptions Afterlecture
No ratings yet
Onur 447 Spring15 Lecture11 Precise Exceptions Afterlecture
49 pages
ILP2
No ratings yet
ILP2
16 pages
CompArch 17e ILP-1
No ratings yet
CompArch 17e ILP-1
15 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Falut Collapsing
No ratings yet
Falut Collapsing
41 pages
5-Stage Pipeline CPU Hardware
No ratings yet
5-Stage Pipeline CPU Hardware
33 pages
Repair GRUB2 When Ubuntu Won't Boot
No ratings yet
Repair GRUB2 When Ubuntu Won't Boot
14 pages
!keychorn K8 Manual
No ratings yet
!keychorn K8 Manual
3 pages
Branch Instructions
No ratings yet
Branch Instructions
24 pages
Lec02 Superscalar SW VLIW 22 23
No ratings yet
Lec02 Superscalar SW VLIW 22 23
34 pages
Ca HW5
No ratings yet
Ca HW5
4 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
CSE502 Lec10 11-Dynamic-schedB SpeculationS10
No ratings yet
CSE502 Lec10 11-Dynamic-schedB SpeculationS10
36 pages
03ILP Speculation and Advanced Topics
No ratings yet
03ILP Speculation and Advanced Topics
48 pages
UNIT-3 Hardware-Based Speculation
No ratings yet
UNIT-3 Hardware-Based Speculation
27 pages
Hardware Support For Exposing Parallelism
No ratings yet
Hardware Support For Exposing Parallelism
8 pages
MPLS VPN Setup Guide
No ratings yet
MPLS VPN Setup Guide
4 pages
CPU Structure & Functions
No ratings yet
CPU Structure & Functions
44 pages
CS6461 Computer Architecture Lecture 8
No ratings yet
CS6461 Computer Architecture Lecture 8
61 pages
Hardware
No ratings yet
Hardware
24 pages
Reading Assignment1
No ratings yet
Reading Assignment1
15 pages
Advanced Scheduling Tutorial
No ratings yet
Advanced Scheduling Tutorial
28 pages
ARM Architecture Overview
No ratings yet
ARM Architecture Overview
44 pages
How To Connect Inverter
No ratings yet
How To Connect Inverter
6 pages
Assignment DataSheet
No ratings yet
Assignment DataSheet
5 pages
Ch04 - Finally - Stack
No ratings yet
Ch04 - Finally - Stack
10 pages
And, Finally... The Stack
No ratings yet
And, Finally... The Stack
36 pages
Cisco 350-701 Exam Dumps 2023
No ratings yet
Cisco 350-701 Exam Dumps 2023
8 pages
ANSYS Quick Start Installation Guide
No ratings yet
ANSYS Quick Start Installation Guide
4 pages
Solution Ch01
89% (19)
Solution Ch01
4 pages
Baldor - Smart - Move - AN00197-003 Replacing BaldorCAN Keypads
100% (1)
Baldor - Smart - Move - AN00197-003 Replacing BaldorCAN Keypads
7 pages
Computer Architecture: Speculation & Multiple Issue
No ratings yet
Computer Architecture: Speculation & Multiple Issue
22 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
CS252 Graduate Computer Architecture Reorder Buffers and Explicit Register Renaming
No ratings yet
CS252 Graduate Computer Architecture Reorder Buffers and Explicit Register Renaming
55 pages
STM 32 L 452 CC
No ratings yet
STM 32 L 452 CC
221 pages
Vmware 2V0-620
100% (1)
Vmware 2V0-620
79 pages
ILP Limitations
No ratings yet
ILP Limitations
31 pages
3.hardware Support For Exposing Parallelism
No ratings yet
3.hardware Support For Exposing Parallelism
21 pages
DX Diag
No ratings yet
DX Diag
27 pages
MIPS DRAM Optimization Guide
No ratings yet
MIPS DRAM Optimization Guide
4 pages
M116C 1 M116C 1 Lec10-Pipeline-II
No ratings yet
M116C 1 M116C 1 Lec10-Pipeline-II
18 pages
AWS STP SAP Business v2 - AM ForSend
No ratings yet
AWS STP SAP Business v2 - AM ForSend
43 pages
Ubuntu Server Guide PDF
No ratings yet
Ubuntu Server Guide PDF
284 pages
PLC Positioning Control Registers Guide
No ratings yet
PLC Positioning Control Registers Guide
1 page
Micro Controller
No ratings yet
Micro Controller
17 pages
Process List
No ratings yet
Process List
10 pages
Apache Active
0% (1)
Apache Active
140 pages
Linux Printer Driver - TM-Intelligent Printer - Download - POS - Epson
No ratings yet
Linux Printer Driver - TM-Intelligent Printer - Download - POS - Epson
1 page
Sco Admin Guide
No ratings yet
Sco Admin Guide
1,160 pages
WM55R Specification 12058896
No ratings yet
WM55R Specification 12058896
4 pages
Kernel Configuration: Websphere MQ Quick Beginnings For Linux
No ratings yet
Kernel Configuration: Websphere MQ Quick Beginnings For Linux
2 pages
CIS Module 3 VDC Compute
No ratings yet
CIS Module 3 VDC Compute
45 pages
Lab Manual Operating System 29sep18
No ratings yet
Lab Manual Operating System 29sep18
13 pages
Module 11
No ratings yet
Module 11
14 pages
PRIME Z790-A WIFI Manual
No ratings yet
PRIME Z790-A WIFI Manual
80 pages
اخر سلايد محذوف LEC 8 - interrupt - handling
No ratings yet
اخر سلايد محذوف LEC 8 - interrupt - handling
19 pages
CEA242 Module 4 - Introduction To Active Directoty and Account Management
No ratings yet
CEA242 Module 4 - Introduction To Active Directoty and Account Management
67 pages
LKL
No ratings yet
LKL
35 pages
RTX 5.0 User Guide
No ratings yet
RTX 5.0 User Guide
121 pages

Advanced CPU Speculation Techniques

Uploaded by

Advanced CPU Speculation Techniques

Uploaded by

Multiple Instruction Issue

•Exploiting more ILP requires that we overcome the

B1 x < y? •Assume the processor

•In order to execute instructions speculatively, we

•Both can be provided by using a simple structure

•It is a simple circular array with a head and a tail

 3 fields: instr, destination, value

1. Issue [get instruction from FP Op Queue]

• Check if the reorder buffer is full.

Once issued, the instruction stays in the reservation

2. Execute [operate on operands (EX)]

3. Write result [finish execution (WB)]

4. Commit [update register file with reorder result]

The instruction becomes non-speculative:

A mispredicted branch flushes the reorder buffer.

Common variation of speculative design

1. Index a MAP table using the

3. Allocate a free physical

Map table Code sequence

Map table Code sequence Renamed Code sequence

Map table Code sequence Renamed Code sequence

Map table Code sequence Renamed Code sequence

Map table Code sequence Renamed Code sequence

Map table Code sequence Renamed Code sequence

Map table Code sequence Renamed Code sequence

Assumptions for ideal/perfect machine to start:

120 Integer: 18 - 60 118.7

Change from Infinite window 61 FP: 15 -6045

gcc espresso li fpppp doducd tomcatv

Perfect Selective predictor Standard 2-bit Static None

window, 64 instr issue, 8K 49

gcc espresso li fpppp doducd tomcatv

Infinite 256 128 64 32 None

Change 2000 instr window,

gcc espresso li fpppp doducd tomcatv

Perfect Global/stack Perfect Inspection None

gcc expresso li fpppp doducd tomcatv

Infinite 256 128 64 32 16 8 4

You might also like