0% found this document useful (0 votes)

9 views12 pages

Lecture #3

Lecture #3 discusses control transfer instructions (CTIs) and their impact on pipeline performance due to control hazards. It outlines various prediction techniques, including dynamic branch prediction and branch target buffers, to mitigate these hazards and improve instruction fetch efficiency. The lecture emphasizes the importance of managing speculation and the role of complex hardware structures in modern processor design to maintain high performance.

Uploaded by

Braincain007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views12 pages

Lecture #3

Uploaded by

Braincain007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Lecture #3 - Processing of Control Transfer

Instructions

Review: Data Dependency

We have discussed the different types of data dependencies and how they cause dif-
ferent hazards (e.g., RAW, WAR, WAW). If we look at an example some pseudo
instructions:

; i1 : load a
mov r9 , [ a ]

; i2 : load b
mov r10 , [ b ]

; i3 : add
lea rcx , [ r9 + r10 ]

; i4 : multiply
mov rdx , r9
imul rdx , r10

; i5 : divide
mov rax , r9
cqo
idiv r10
mov r11 , rax
Code Example 1: Implementation of the Data Dependencies Hazards

Due to the instruction flow (i.e., how registers have been used), we have different
dependencies that requires waiting on some data from previous instructions which
are causing the hazards.

It may be hard to see them so lets look at a graph version of this:

1
i1 RAW

RAW
i2 i3

WAW
RAW
WAR
WAW WAW

i4
i5
RAW

WAW

Figure 1: Data Dependency Graph of Data Dependencies Hazards

The Control Hazard Problem

What are Control Transfer Instructions (CTIs)?

• Instructions that change the Program Counter (PC) non-sequentially.

• Branches: Conditional change based on data/flags (e.g., jeq, jne).

• Jumps: Unconditional change (e.g., jmp).

• Function Calls: Jump + save return address (call).

• Returns: Jump to saved return address (ret).

The Pipeline Problem (Control Hazard):

• The pipeline fetches instructions sequentially (PC+4).

• By the time a CTI is identified and its outcome/target address is known (of-
ten late in the pipeline, e.g., ID or EX stage), several subsequent (potentially
incorrect) instructions may have already entered the pipeline.

• Pipeline must stall or flush these incorrect instructions, creating ”bubbles” and
reducing performance (IPC drops, CPI increases).

2
Figure 2: A load instruction followed by an immediate use results in a x1 stall

Early Solutions (and their limitations)

• Stall/Freeze Pipeline: Simplest approach. Stop fetching new instructions

once a branch is detected until its outcome and target are known.

– Problem: Creates significant performance loss (multiple cycles per branch).

• Predict Branch Not Taken: Always fetch the sequential instruction (PC+4).
If the branch is taken, flush the incorrectly fetched instruction(s).

– Problem: Many branches are taken (esp. loop branches). Still significant
flushing.

• Predict Branch Taken: Always assume the branch is taken.

– Problem: Requires knowing the target address early. Still stalls if predicted
incorrectly.

• (Optional: Delayed Branch): The instruction(s) immediately following the

branch are always executed, regardless of the branch outcome. Compiler tries
to fill slot(s) with useful work.

– Problem: Hard for compilers to fill slots effectively, complex for deeper
pipelines, breaks precise exception model, largely obsolete in modern high-
performance designs.

Dynamic Branch Prediction - Core Idea

Goal: Predict the outcome (Taken (T)/Not Take (NT)) and target address of a
branch dynamically at runtime, based on past behavior.

Why it Works: Program behavior, especially branches (e.g., loops, error checks), is
often repetitive and predictable.
Key components:

• Outcome Prediction: Predict (or guess) Taken/Not Taken.

– Target Address Prediction: Predict/Guess the destination PC if taken.

3
Integration: Prediction often happens early (IF or ID stage) to avoid fetch stalls.

Branch Outcome Prediction: Simple Predictors

Branch History Table (BHT) / Pattern History Table (PHT): A small mem-
ory indexed by (part of) the branch instruction’s PC. stores prediction state.

1-bit Predictor: Stores the outcome of the last execution. Flips prediction on a
single mispredict.

• Problem: Mispredicts twice on typical loop exits (last iteration taken → predict
taken → exit (NT) → mispredict; first iteration NT → predict NT → enter loop
(T) → mispredict).

2-bit Saturating Counter Predictor: Uses 4 states (e.g., Strongly Taken, Weakly
Taken, Weakly Not Taken, Strongly Not Taken). Requires two consecutive mispre-
dictions to change from strongly T/NT.

• Much better performance, especially for loops. Standard building block.

Figure 3: 2-bit prediction scheme

Transitions between these states occur based on whether the branch is taken or not.
The key advantage is hysteresis: a single misprediction doesn’t immediately flip the
prediction, which stabilizes the predictor in noisy conditions.

4
Advanced Outcome Prediction: Correlating Predictors
Idea: The outcome of a branch may depend on the outcome of other, recent branches.

Example:
if ( aa ==2) ... // B1
if ( bb ==2) ... // B2
if ( aa != bb ) { // B3
...
}

Outcome of B3 depends on B1 and B2.

• (m,n1) Predictor: Uses the behavior of the last m branches (global history)
to choose among 2m different n-bit predictors for the current branch.

• Global History Register (GHR): Shift register recording outcomes of last

m branches.

• Implementation (e.g., gshare): Combine (XOR) global history with branch

PC bits to index into a single large table of 2-bit counters. Reduces table size
compared to having separate tables per history pattern.

Figure 4: A gshare predictor with 1024 entries (each being a standard 2-bit predictor).

5
Advanced Outcome Prediction: Tournament Predictors
Idea: Different branches might be predicted better by different strategies (e.g., some
correlate well with global history, others with their own local history). Use multiple
predictors and dynamically select the best one for each branch. Structure:

Figure 5: Tournament Predictor

• Typically combines a local predictor (based only on the history of this branch)
and a global predictor (like gshare).

• A ”Choice Predictor” (meta-predictor), often another table of 2-bit counters,

tracks which underlying predictor (local or global) has been more accurate re-
cently fora given branch/history, and selects its prediction.

Performance: Generally offers higher accuracy than either local or global alone.

Advanced Outcome Prediction: Hybrid/Tagged Predictors

(TAGE)
Motivation: Longer history can be better, but requires huge tables and suffers from
cold starts/interference. Need to balance history length and table size/accuracy.

TAGE (Tagged Geometric History Length): State-of-the-art approach.

• Uses multiple predictor tables, indexed by different (geometrically increasing)

lengths of global history combined with PC.

• Tables are tagged to detect if an entry belongs to the current branch/history

(reduces interference).

• Uses partial tags for efficiency.

• Prediction comes from the longest matching history table entry. Has sophisti-
cated update/allocation mechanisms.

6
Performance: Outperforms ghsare and simple tournament predictors, especially
with limited storage budget.

Figure 6: Five-Component Tagged Hybrid Predictor

Predicting the Branch Target Address

Problem: Outcome prediction isn’t enough for taken branches/jumps. We also need
the target address early to redirect fetch. Decoding the instruction to calculate return
is too slow.

Solution → Branch Target Buffer (BTB):

• A small cache memory indexed by the address (PC) of the CTI.

• Stores the prediction target address for that CTI (if previously taken).

• Often stores the branch prediction state (e.g., 2-bit counter) as well.

Operation:

• During IF, PC indexes the BTB.

• BTB Hit: Branch is predicted (based on stored state); target address is avail-
able immediately. Fetch redirects if predicted taken.

• BTB Miss: Assume not a branch, or assume not taken. Fetch PC+4. If later
found to be a taken branch, flush and redirect (causes penalty).

7
Figure 7: Branch Target Buffer (BTB) mechanism

The BTB enables early fetching of the next instruction (speculatively) before the
branch is resolved, minimizing control hazards and pipeline stalls.

8
Figure 8: BTB Lookup and Handling Process

This flowchart details how a branch instruction is handled with a BTB:

1. Instruction Fetch (IF): The PC is used to index the BTB.
2. Match Found: If a matching branch is found, fetch begins from the predicted
target.
3. No Match: If no match is found, fetch proceeds sequentially.
4. Branch Execution: The actual branch is resolved in the Execute (EX) stage.
5. Update BTB: If the prediction was incorrect or if the branch was not in the
BTB, update or insert a new entry.
This sequence is vital in understanding speculative execution and control hazard
resolution.

Figure 9: Penalty Scenarios Based on BTB State

9
Handling Returns: Return Address Stack (RAS)
Problem: Function returns are indirect jumps (target address is in a register or on
the stack). The target varies depending on the call site BTBs don’t work well for
them because the same ret instruction goes to different targets.

Observation: Calls and returns are typically nested and matched.

Solution → Return Address Stack (RAS):

• A small hardware stack.

• On function call(jal, call), hardware pushes the return address (PC+4) onto the
RAS.

• On function return (ret, jr), hardware predicts the target by popping the address
from the top of the RAS.

Performance: Very effective, significantly improves return prediction accuracy.

Figure 10: Return Address Buffer Prediction Accuracy

Most unconditional branches come from function returns... Causes the buffer to
potentially forget about the return address from previous calls. So, create return
address buffer organized as a stack (i.e., now a standard feature in nearly all modern
superscalar CPUs).

Integrated Instruction Fetch Units (IFU)

Modern Processors often have a dedicated IFU responsible for providing a high band-
width stream of correct-path instructions.

10
Combines:

• PC generation logic.

• Branch prediction (outcome prediction tables like PHT/TAGE).

• Branch Target Buffer (BTB).

• Return Address Stack (RAS).

• Instruction Cache (I-Cache) access.

• Instruction Buffering/Queuing.

IFU is a monolithic hardware unit designed to optimize instruction supply by bundling

multiple responsibilities traditionally split across control and decode logic. It enables
high-throughput and low-latency instruction delivery in superscalar and speculative
execution pipelines.

It handles:

• Branch prediction

• Instruction prefetching

• Fetch-ahead logic

• Instruction memory access

• Instruction buffering

• Cache line boundary management

This integration is essential for wide-issue processors and architectures with specu-
lative execution, where a high instruction fetch rate is critical to keep the backend
full.

Branch Folding Optimization: If a BTB entry holds the predicted target instruc-
tion itself (not just address), the IFU can provide the target instruction directly,
potentially skipping I-Cache access latency for taken branches.

Speculative Execution
What are the consequences of Prediction?

• Once a branch is predicted (outcome + target), the processor doesn’t wait. It

fetches and executes instructions from the predicted path. This is speculation.

Why Speculate?

• Avoid stalling the pipeline, crucial for exploiting ILP across branches.

11
Challenge → What if prediction was wrong?

• Must not let speculative instructions change the architectural state permanently
(registers, memory) until the branch is confirmed.

• Must be able to efficiently discard the results of speculative work and recover
by starting fetch/execute down the correct path.

Mechanisms:

• Reorder Buffer (ROB) or similar structures buffer results of speculative instruc-

tions.

• Instructions commit (update architectural state) in program order only after

confirmed to be on the correct path.

• Mispredicted branches cause ROB’s entries for subsequent speculative instruc-

tions to be flushed.

Overall
• Control Transfer Instructions are fundamental but pose a major challenge to
pipelined performance due to control hazards.

• Stalling is too slow for modern processors.

• Dynamic Branch Prediction (predicting outcome and target address) is essen-

tial.

– Techniques evolved from simple 2-bit counters to complex correlating, tour-

nament, and hybrid predictors (e.g., TAGE).
– BTBs predict target addresses for direct branches/jumps.
– RAS predicts target addresses for function returns.

• Prediction enables Speculative Execution, allowing the pipeline to continue

working down the predicted path, further hiding branch latency.

• Managing speculation (buffering results, recovering from mispredicts) requires

complex hardware like ROBs.

• Effective handling of control flow is a cornerstone of high-performance processor

design.

MSRV32I Core Design Specification
No ratings yet
MSRV32I Core Design Specification
44 pages
Branch Prediction: Prof. Mikko H. Lipasti University of Wisconsin-Madison
No ratings yet
Branch Prediction: Prof. Mikko H. Lipasti University of Wisconsin-Madison
22 pages
5 4-Pipelining
No ratings yet
5 4-Pipelining
10 pages
CA Lecture 4 Module 3
No ratings yet
CA Lecture 4 Module 3
27 pages
Anch Prediction
No ratings yet
Anch Prediction
183 pages
EE557 SP25 HW2 Sol
No ratings yet
EE557 SP25 HW2 Sol
9 pages
Branch Predicter Project
No ratings yet
Branch Predicter Project
20 pages
Onur 447 Spring15 Lecture9 Branch Prediction Afterlecture
No ratings yet
Onur 447 Spring15 Lecture9 Branch Prediction Afterlecture
65 pages
17.L15 BranchPrediction
No ratings yet
17.L15 BranchPrediction
38 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
28 pages
Instruction Hazards
No ratings yet
Instruction Hazards
20 pages
Branch Pred
No ratings yet
Branch Pred
27 pages
Branch Handling
No ratings yet
Branch Handling
23 pages
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
No ratings yet
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
93 pages
Anch Prediction
No ratings yet
Anch Prediction
25 pages
Lect09 Adv Branch Prediction
No ratings yet
Lect09 Adv Branch Prediction
55 pages
Pipeline Part 2 and Data Hazards
No ratings yet
Pipeline Part 2 and Data Hazards
11 pages
L10 PipelineHazards 3
No ratings yet
L10 PipelineHazards 3
35 pages
App C
No ratings yet
App C
50 pages
Computer Engineering Assignment
No ratings yet
Computer Engineering Assignment
24 pages
Unit 1
100% (1)
Unit 1
44 pages
CA L15b BranchPrediction DynamicPredictors
No ratings yet
CA L15b BranchPrediction DynamicPredictors
25 pages
Computer Architecture: Branching
No ratings yet
Computer Architecture: Branching
37 pages
Computer Architecture Insights
100% (1)
Computer Architecture Insights
55 pages
Branch Prediction
No ratings yet
Branch Prediction
38 pages
CA L15a BranchPrediction Intro and StaticPredictors
No ratings yet
CA L15a BranchPrediction Intro and StaticPredictors
19 pages
10 Branchprediction
No ratings yet
10 Branchprediction
49 pages
Branch Predictors
No ratings yet
Branch Predictors
41 pages
07 Branch Prediction
No ratings yet
07 Branch Prediction
35 pages
05 - Pipelining - Branch Prediction
No ratings yet
05 - Pipelining - Branch Prediction
20 pages
9.1.0 Branch Prediction Pentiums IBM PPC
No ratings yet
9.1.0 Branch Prediction Pentiums IBM PPC
163 pages
Pipeline Control Hazards Explained
No ratings yet
Pipeline Control Hazards Explained
20 pages
Cs146-Lecture7 2
No ratings yet
Cs146-Lecture7 2
17 pages
Branch Prediction: Joel Emer
No ratings yet
Branch Prediction: Joel Emer
36 pages
Branch Prediction Two Level
No ratings yet
Branch Prediction Two Level
2 pages
Questions That I Encountered
No ratings yet
Questions That I Encountered
9 pages
Advanced Branch Prediction
No ratings yet
Advanced Branch Prediction
45 pages
Advanced Branch Prediction Techniques
No ratings yet
Advanced Branch Prediction Techniques
24 pages
Ue21ec341b 20240412163937
No ratings yet
Ue21ec341b 20240412163937
22 pages
Pipeline - Instr - Super Branch
No ratings yet
Pipeline - Instr - Super Branch
48 pages
Aca Unit-4 Notes
No ratings yet
Aca Unit-4 Notes
23 pages
L11 PipelineHazards 4
No ratings yet
L11 PipelineHazards 4
30 pages
CA - Slides
No ratings yet
CA - Slides
28 pages
Co-4-2nd Part
No ratings yet
Co-4-2nd Part
4 pages
Advanced Branch Prediction Techniques
No ratings yet
Advanced Branch Prediction Techniques
41 pages
Branch Prediction Techniques
No ratings yet
Branch Prediction Techniques
29 pages
PDC S1
No ratings yet
PDC S1
23 pages
5.4 Branch Prediction Logic
No ratings yet
5.4 Branch Prediction Logic
5 pages
CS252 Graduate Computer Architecture Prediction (Con't) (Dependencies, Load Values, Data Values) February 22, 2010
No ratings yet
CS252 Graduate Computer Architecture Prediction (Con't) (Dependencies, Load Values, Data Values) February 22, 2010
54 pages
Modern CPU
No ratings yet
Modern CPU
14 pages
Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 16 Branch Prediction
No ratings yet
Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 16 Branch Prediction
26 pages
Practice Final Exam
No ratings yet
Practice Final Exam
11 pages
Instruction Pipelining Basics
No ratings yet
Instruction Pipelining Basics
20 pages
Implementing A Branch Predictor
No ratings yet
Implementing A Branch Predictor
7 pages
Branch Prediction Techniques: Prof. Pimal Khanpara Department of Computer Science & Engineering
No ratings yet
Branch Prediction Techniques: Prof. Pimal Khanpara Department of Computer Science & Engineering
20 pages
Lesson 1 - Overview & Key Concepts
No ratings yet
Lesson 1 - Overview & Key Concepts
12 pages
Branch Prediction - 1: Computer Architecture: A Constructive Approach
No ratings yet
Branch Prediction - 1: Computer Architecture: A Constructive Approach
29 pages
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
No ratings yet
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
27 pages
Branch Prediction Techniques
No ratings yet
Branch Prediction Techniques
48 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
17 pages
Lec5 PDF
No ratings yet
Lec5 PDF
23 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
7 pages
Microcontrollers and Applications - Unit 1 and 2 Notes
No ratings yet
Microcontrollers and Applications - Unit 1 and 2 Notes
181 pages
ECE VI Sem Computer Architecture
100% (1)
ECE VI Sem Computer Architecture
13 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
55 pages
Conditional Branches
No ratings yet
Conditional Branches
35 pages
EEE 105 Lab Exercise 1: Introduction To Assembly Language Programming
No ratings yet
EEE 105 Lab Exercise 1: Introduction To Assembly Language Programming
2 pages
What Are Pipeline Hazards
100% (1)
What Are Pipeline Hazards
2 pages
Microprocessor Course Outline 2019
No ratings yet
Microprocessor Course Outline 2019
3 pages
CAO Pipelining Lecture
No ratings yet
CAO Pipelining Lecture
50 pages
Eng Et Al. - 2024 - Patterns of Multi-Container Composition For Service Orchestration With Docker Compose
No ratings yet
Eng Et Al. - 2024 - Patterns of Multi-Container Composition For Service Orchestration With Docker Compose
43 pages
Lshift4, Program Counter, ALU, Register File and RARS
No ratings yet
Lshift4, Program Counter, ALU, Register File and RARS
17 pages
Perrich Tealeaf (Thief)
No ratings yet
Perrich Tealeaf (Thief)
2 pages
Perrich Tealeaf (Thief)
No ratings yet
Perrich Tealeaf (Thief)
2 pages
An FPGA-Based Pentium in A Complete Desktop System: Shih-Lien L. Lu Peter Yiannacouras Taeweon Suh
No ratings yet
An FPGA-Based Pentium in A Complete Desktop System: Shih-Lien L. Lu Peter Yiannacouras Taeweon Suh
7 pages
8086 Microprocessor Architecture
No ratings yet
8086 Microprocessor Architecture
14 pages
Instruction Set
No ratings yet
Instruction Set
40 pages
PIC16F84 Instruction Guide
No ratings yet
PIC16F84 Instruction Guide
24 pages
The Z80 Microprocessor
100% (1)
The Z80 Microprocessor
49 pages
Assembly #4
No ratings yet
Assembly #4
3 pages
Final - Study Guide
No ratings yet
Final - Study Guide
3 pages
Midterm - Study Guide
No ratings yet
Midterm - Study Guide
4 pages
A Survey of Fault Tolerance Mechanisms Adn Checkpoint Restart Implementations For High Performance Computing Systems
No ratings yet
A Survey of Fault Tolerance Mechanisms Adn Checkpoint Restart Implementations For High Performance Computing Systems
25 pages
Choi Lecture CH19
No ratings yet
Choi Lecture CH19
2 pages
Wk3 - Lecture 3-27-25 Practical Firewalls - WB
No ratings yet
Wk3 - Lecture 3-27-25 Practical Firewalls - WB
41 pages
Instruction Level Parallelism Through Microtrheading - A Scalable Approach To Chip Multiprocessors
No ratings yet
Instruction Level Parallelism Through Microtrheading - A Scalable Approach To Chip Multiprocessors
23 pages
Instruction Scheduling For Instruction Level Parallel Processors
No ratings yet
Instruction Scheduling For Instruction Level Parallel Processors
22 pages
8086 Instructions
No ratings yet
8086 Instructions
25 pages
Assembly #2
No ratings yet
Assembly #2
5 pages
A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems
No ratings yet
A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems
13 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
65 pages
Increasing Instruc: Microprocessors W of The Oe
No ratings yet
Increasing Instruc: Microprocessors W of The Oe
3 pages
Microprocessor Lab Report
No ratings yet
Microprocessor Lab Report
8 pages
Pythia: Compiler-Guided Defense Against Non-Control Data Attacks
No ratings yet
Pythia: Compiler-Guided Defense Against Non-Control Data Attacks
17 pages
William Stallings Computer Organization and Architecture: CPU Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture: CPU Structure and Function
40 pages
Pooja Vashisth
No ratings yet
Pooja Vashisth
68 pages
Basics of Embedded Systems (RX63N) - Exercises
No ratings yet
Basics of Embedded Systems (RX63N) - Exercises
27 pages
Machine Language
No ratings yet
Machine Language
15 pages
Advanced Superscalar Microprocessor
No ratings yet
Advanced Superscalar Microprocessor
13 pages
.Data DB DW DD: Addressing Memory
No ratings yet
.Data DB DW DD: Addressing Memory
17 pages
8051 Microcontroller Loop & Jump Guide
No ratings yet
8051 Microcontroller Loop & Jump Guide
25 pages
Prog 16C
No ratings yet
Prog 16C
190 pages

Lecture #3

Uploaded by

Lecture #3

Uploaded by

Lecture #3 - Processing of Control Transfer

Review: Data Dependency

It may be hard to see them so lets look at a graph version of this:

Figure 1: Data Dependency Graph of Data Dependencies Hazards

The Control Hazard Problem

• Instructions that change the Program Counter (PC) non-sequentially.

• Branches: Conditional change based on data/flags (e.g., jeq, jne).

• Jumps: Unconditional change (e.g., jmp).

• Function Calls: Jump + save return address (call).

• Returns: Jump to saved return address (ret).

The Pipeline Problem (Control Hazard):

• The pipeline fetches instructions sequentially (PC+4).

Early Solutions (and their limitations)

• Stall/Freeze Pipeline: Simplest approach. Stop fetching new instructions

– Problem: Creates significant performance loss (multiple cycles per branch).

• Predict Branch Taken: Always assume the branch is taken.

• (Optional: Delayed Branch): The instruction(s) immediately following the

Dynamic Branch Prediction - Core Idea

• Outcome Prediction: Predict (or guess) Taken/Not Taken.

– Target Address Prediction: Predict/Guess the destination PC if taken.

Branch Outcome Prediction: Simple Predictors

• Much better performance, especially for loops. Standard building block.

Figure 3: 2-bit prediction scheme

Outcome of B3 depends on B1 and B2.

• Global History Register (GHR): Shift register recording outcomes of last

• Implementation (e.g., gshare): Combine (XOR) global history with branch

Figure 5: Tournament Predictor

• A ”Choice Predictor” (meta-predictor), often another table of 2-bit counters,

Advanced Outcome Prediction: Hybrid/Tagged Predictors

TAGE (Tagged Geometric History Length): State-of-the-art approach.

• Uses multiple predictor tables, indexed by different (geometrically increasing)

• Tables are tagged to detect if an entry belongs to the current branch/history

• Uses partial tags for efficiency.

Figure 6: Five-Component Tagged Hybrid Predictor

Predicting the Branch Target Address

Solution → Branch Target Buffer (BTB):

• A small cache memory indexed by the address (PC) of the CTI.

• During IF, PC indexes the BTB.

This flowchart details how a branch instruction is handled with a BTB:

Figure 9: Penalty Scenarios Based on BTB State

Observation: Calls and returns are typically nested and matched.

Solution → Return Address Stack (RAS):

• A small hardware stack.

Performance: Very effective, significantly improves return prediction accuracy.

Figure 10: Return Address Buffer Prediction Accuracy

Integrated Instruction Fetch Units (IFU)

• Branch prediction (outcome prediction tables like PHT/TAGE).

• Branch Target Buffer (BTB).

• Return Address Stack (RAS).

• Instruction Cache (I-Cache) access.

IFU is a monolithic hardware unit designed to optimize instruction supply by bundling

• Instruction memory access

• Cache line boundary management

• Once a branch is predicted (outcome + target), the processor doesn’t wait. It

• Reorder Buffer (ROB) or similar structures buffer results of speculative instruc-

• Instructions commit (update architectural state) in program order only after

• Mispredicted branches cause ROB’s entries for subsequent speculative instruc-

• Stalling is too slow for modern processors.

• Dynamic Branch Prediction (predicting outcome and target address) is essen-

– Techniques evolved from simple 2-bit counters to complex correlating, tour-

• Prediction enables Speculative Execution, allowing the pipeline to continue

• Managing speculation (buffering results, recovering from mispredicts) requires

• Effective handling of control flow is a cornerstone of high-performance processor

You might also like