0% found this document useful (0 votes)

31 views39 pages

Computer Systems Pipelining Guide

Carleton University

Uploaded by

celestemelody

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views39 pages

Computer Systems Pipelining Guide

Carleton University

Uploaded by

celestemelody

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Department of Systems and Engineering

Design, Carleton University

SYSC 3320 Computer Systems Design

Processors – Pipelining
Pipelining as a technique to improve performance
• Recall: ‘Iron Law’ for processor performance

• Three factors to improve CPU performance

1) Time per cycle
2) Clock cycles per instruction
3) Instructions per program
•Performance is a product of three factors that are not independent of one
another. It is important to concentrate on ‘reducing’ all three factors. Trying to
reduce “instructions per program” is a compiler/developer dependent. Trying to
reduce “time/cycle” means a high clock frequency which has reached a limit due
to clock technology limitations. Scientists focused on improving
Cycles/Instruction. Pipelining is technique to increase the instructions executed
per clock cycle which is equivalent to reducing the Cycles/Instruction.

2
Recall: Instruction Execution Cycle
• Instruction execution cycle has two main phases
➢ Fetch
➢ Decode SYSTE
MEMORY
CPU M BUS
➢ Execute REGISTERS
- Program counter (PC)
- Instruction Register (IR)
- Memory Address Reg. (MAR)
- Status Register (SR)
- Stack Pointer (SP) .
A .
- Gener..al Purpose Reg. (GPR1) .
Gener.alPurpose Reg. (GPR2) . D C
- D
D N
A
R T
T
ALU CNTRL E R
A I/O Device 1
-
-
ADD, SUB,MULT,DIV
COMPL
S L
- SHIFT
S
.

I/O Device 2

• This is an overly simplified sequence. Real processors have much more

complicated steps to execute an instruction.
3
Instruction Execution Cycle
• A more realistic Instruction execution cycle has the following main phases
1) Instruction Fetch (IF) stage:
Fetch an instruction from instruction memory
2) Instruction Decoding (ID) stage:
Decode instruction and reads registers from register file (or register bank)
3) Execution (EX) stage:
Execute instruction. If ALU, perform the ALU operation, if load/store, calculate memory address
4) Memory access (MEM) stage:
Access memory for a load/store instruction
5) Writeback (WB) stage :
Write the results into the register file (or register bank)

4
Instruction Execution Cycle
• A more realistic Instruction execution cycle has the following main phases
1) Instruction Fetch (IF) stage:
Fetch an instruction from instruction memory
2) Instruction Decoding (ID) stage:
Decode instruction and reads registers from register file (or register bank)
3) Execution (EX) stage:
Execute instruction. If ALU, perform the ALU operation, if load/store, calculate memory address
4) Memory access (MEM) stage:
Access memory for a load/store instruction
5) Writeback (WB) stage :
Write the results into the register file (or register bank)
• Each phase takes one clock cycle
• Conventional CPUs will implement these phases “in series” or “sequentially”
• This means that the total instruction time equals to the sum of all phases time
• Obviously, this is a time-consuming process that can be significantly enhanced if we
can run the different phases “in parallel”. This is called “pipelining”.

5
Principles of an ideal pipeline
• Pipelining is a technique to introduce parallelism to the system
• All object must go through all stages
• Sharing of resources is not allowed
• Propagation delay for all stages is the same
• Ideally there should be no dependency between the stages
• This cannot be satisfied about microprocessors (why?)

Stage 1 Stage 2 Stage 3

3/19/2023 6
Instruction Pipelining
• Given the instruction execution stages defined as follows:
Instruction Fetch (IF) stage
Instruction Decoding (ID) stage
Execution (EX) stage
Memory access (MEM) stage
Writeback (WB) stage
• Assume each stage takes one clock cycle T
• A major limitation of conventional instruction execution is each stage wait for ALL
previous stages to finish before it can proceed.

• Pipelining optimizes this process by running stages “in parallel” so instruction

executions will “overlap” instead of being completely sequential.

7
Pipeline diagram
Control
Unit

Add

Add
4
Shift
PC

MUX
Instruction

ALU
memory Registers
Data
Memory
MUX MUX

Imm

time t0 t1 t2 t3 t4 t5 t6 t7 t8
Instruction 1 IF1 ID1 EX1 MA1 WB1
Instruction 2 IF1 ID1 EX1 MA1 WB1
Instruction 3 IF1 ID1 EX1 MA1 WB1
Instruction 4 IF1 ID1 EX1 MA1 WB1
Instruction 5 IF1 ID1 EX1 MA1 WB1
3/20/2023 8
Instruction Pipelining

Cycles per instruction (CPI) = 5

Instructions per cycle (IPC) = 1/5

Cycles per instruction (CPI) = 1

Instructions per cycle (IPC) = 1

Instructions still take the same clock cycles but the

overlapped processing reduces cycles per instructions

9
Pipelining Concept
• Each pipeline stage takes 1 clock cycle
• The clock cycle must be long enough to accommodate the slowest pipeline
stage.
• How much speed up we can get using pipelining?
• Under ideal situations, approximately equal to the number of stages
• How many pipe stages we must consider?
• More pipe stages results in shorter clock period
• Might result in extra overhead

Time
IF: Instruction Fetch
Ex
execution

1st IF ID MEM WB
Program

ID: Instruction Decode

IF ID Ex MEM WB EX: Execution
2nd
MEM: Memory access
IF ID Ex MEM WB WB: Write Back
3rd

3/19/2023 10
Example/Discussion
Assume a program consists of 10000 instructions is running on a an non-pipeline
single cycle, A, and a 5-stage pipelined, B, processors. Assuming the clock
frequency of processor A is 200 MHZ, and processor B is 1 GHz, calculate the
speedup in processor B compared to A. Assume we have an ideal pipeline in B,
and the program is made of simple arithmetic instructions.

3/19/2023 11
Pipelining Challenges: Clock Skew
• For a general K-stage pipelining system, ideally, CPI = 1 instead of K (IPC = 1
instead of 1/K). However, these calculations are for the ideal case. In
practice, a pipeline is a hardware structure with a number of registers that
need to be clocked synchronously.

• The additional hardware needed for pipelining will lead to different arrivals of
the clock signal (clock skew) for each stage leading to additional delays.
• In effect, pipelining adds a latency to the clock. If the clock period without
pipelining is ‘t’, this latency adds a factor ‘∆𝑡’ to it. Thus, the overall clock
frequency is reduced.

12
Pipelining Challenges: Clock Skew-Example
Consider an unpipelined processor with a clock period of 2 ns. This
processor is now re-modeled with a five-stage pipeline, which adds
0.2 ns latency to the clock period.
❑ What are the old and new CPIs?
❑ Calculate the ideal and actual speedups obtained

13
Pipelining Challenges: Clock Skew-Example
Consider an unpipelined processor with a clock period of 2 ns. This
processor is now re-modeled with a five-stage pipeline, which adds
0.2 ns latency to the clock period.
❑ What are the old and new CPIs?
❑ Calculate the ideal and actual speedups obtained
Ideal Case
Without pipelining: Execution time without pipelining = 5 × 2 = 10 ns.
With pipelining: For an ideal pipeline, after the first instruction one
instruction is delivered at every cycle. Execution time with pipelining =
2 ns.

Speedup = Execution time without pipelining/Execution time

with pipelining = 10/2 = 5 = Number of pipeline stages.
CPI = 5 (without pipelining) and 1 (with pipelining).

14
Pipelining Challenges: Clock Skew-Example
Consider an unpipelined processor with a clock period of 2 ns. This
processor is now re-modeled with a five-stage pipeline, which adds
0.2 ns latency to the clock period.
❑ What are the old and new CPIs?
❑ Calculate the ideal and actual speedups obtained
Non-ideal Case
With pipelining, latency = 0.2 ns, Clock period = 2 + latency = 2 + 0.2 =
2.2 ns. Instruction execution time with pipelining = 2.2 ns.
Speedup = instruction execution time without pipelining/instruction
execution time with pipelining = 10/2.2 = 4.45. Thus, the speedup has
reduced from 5 to 4.45 because of the non-ideal nature of the pipeline.

15
Pipelining Challenges: Clock Skew-Example
Consider an unpipelined processor with a clock period of 2 ns. This
processor is now re-modeled with a five-stage pipeline, which adds
0.2 ns latency to the clock period.
❑ What are the old and new CPIs?
❑ Calculate the ideal and actual speedups obtained
Non-ideal Case
With pipelining, latency = 0.2 ns, Clock period = 2 + latency = 2 + 0.2 =
2.2 ns. Instruction execution time with pipelining = 2.2 ns.
Speedup = instruction execution time without pipelining/instruction
execution time with pipelining = 10/2.2 = 4.45. Thus, the speedup has
reduced from 5 to 4.45 because of the non-ideal nature of the pipeline.

16
Pipelining Challenges: Additional hardware
• The pipeline is clocked and the intermediate results from one stage have to be
forwarded to the next stage in every cycle, while the data from the previous
stage has to be clocked in. This cannot be done without intermediate storage
between stages. Thus, we need inter-stage buffers.
• While designing a pipeline, one mandatory requirement is that the different
stages should be balanced. This means that all stages should have the same or
almost the same latency, as they are to be clocked synchronously. This obviously
means that the pipeline rate is determined by the latency of the slowest stage.

intermediate storage between stages

17
Pipelining Challenges: Additional hardware
• Another hardware challenge is “multiporting” of processor register files.
• We need to read and write simultaneously from register banks
• We need registers banks with “multiport”

CTRL
R1
Registers bank (file)
R2
Port1
..
Processor Port2
..
..
..
.. ALU

18
Pipelining Challenges: Hazards
• Situations where the next instruction cannot be executed in the next pipeline
stages
• Structural
• When a planned instruction cannot execute in the proper clock cycle
because the hardware does not support the combination of instructions
that are set to execute.
• When two instructions need the same hardware resource
• Sharing resources is not possible by different instructions
• Data
• Instruction cannot be executed because data that are needed to execute
the instruction are not yet available.
• Data dependency
• Control
• Conditional Branch Hazards
• The processor does not recognize a branch until later in the pipeline
stages

3/19/2023 19
Structural Hazard
• When two instructions need the same hardware resource
• Sharing resources is not possible by different instructions
• Solutions
• Schedule
• Programmer avoids scheduling instructions that need the same
hardware resource at the same time
• Stall
• Wait until the resource is free an then take the next instruction
• Duplicate
• Add more hardware!
• Example: more ports
• Not always possible

• Interesting to know that MIPS processors never have structural hazards because
of the way its ISA has been designed

3/19/2023 20
Structural Hazard Example: One memory
EX MEM Write
IF (Fetch) ID (Decode) (Execution) (Memory Back
access) (WB)

Add

Add
4
Shift

PC
MUX

ALU
Registers MUX

MUX MUX

Imm

MUX Data and

Instruction
Memory

Ld F D X M W
Add F D X M W
Add F D X M W
Add F D X M W

3/19/2023 21
Structural Hazard Example: One memory
EX MEM Write
IF (Fetch) ID (Decode) (Execution) (Memory Back
access) (WB)

Add

Add
4
Shift

PC
MUX

ALU
Registers MUX

MUX MUX

Imm

MUX Data and

Instruction
Memory

Ld F D X M W
Add F D X M W
Add F D X M W
Add F D X M W

3/19/2023 22
Data Hazard
• Data Dependency
• Solution
• Scheduling
• Programmer avoids scheduling the instructions that causes data hazard
• Stall
• Like freezing earlier instructions
• Bypass
• A hardware mechanism
• Sending some sort of feedback from later stages to the earlier stages in
the pipeline
• Extra hardware complexity
• Speculate
• Guessing that there is no problem, if incorrect kill the speculative
instruction

3/19/2023 23
Data Hazard: Scheduling solution example
Reorder code to avoid use of load result in the next instruction
C code for a = b + e; c = b + f;

ld x1, 0(x0) ld x1, 0(x0)

ld x2, 8(x0) ld x2, 8(x0)
stall add x3, x1, x2 ld x4, 16(x0)
sd x3, 24(x0) add x3, x1, x2
ld x4, 16(x0) sd x3, 24(x0)
stall add x5, x1, x4 add x5, x1, x4
sd x5, 32(x0) sd x5, 32(x0)
13 cycles 11 cycles

24
Example Data Hazard

Second Add First Add

EX MEM Write
IF (Fetch) ID (Decode) (Execution) (Memory Back
access) (WB)
Add

Add
4
Shift
PC

MUX
Instructio

ALU
n memory Registers
Data
Memory
MUX MUX

Imm

Add R1, R0, #20 R1 <- R0 + 20

Add R4, R1, #30 R4 <- R1 + 30
What do we do?

3/19/2023 25
Example Data Hazard: stall solution
First Add
EX MEM Write
IF (Fetch) ID (Decode) (Execution) (Memory Back
access) (WB)
Add

Add
4
Shift
PC

MUX
Instructio

ALU
n memory Registers
Data
Memory
MUX MUX

Second Add
Imm

Add R1, R0, #20 R1 <- R0 + 20

Read after Write
Add R4, R1, #30 R4 <- R1 + 30 (RAW) hazard
We will have to wait for a number of clock cycles and it is not efficient
3/19/2023 26
Stall solution (interlock)

Add R1, R0, #20 R1 <- R0 + 20

Add R4, R1, #30 R4 <- R1 + 30

We must decode the second instruction after the first one is written back
to the register file

Stalled stages
ADD F D X M W
ADD F D D D D X M W
F D X M W
F D X M W

3/19/2023 27
Dependencies & Forwarding

28
Stalls and Performance

Stalls reduce performance

• But are required to get correct results

Compiler can arrange code to avoid hazards and stalls

• Requires knowledge of the pipeline structure

29
Control Hazards
Branch determines flow of control
• Fetching next instruction depends on branch outcome
• Pipeline can’t always fetch correct instruction
• Still working on ID stage of branch

30
Stall on Branch

Wait until branch outcome determined before fetching next instruction

31
Branch Prediction

• Longer pipelines can’t readily determine branch outcome early

• Stall penalty becomes unacceptable

• Predict outcome of branch

• Only stall if prediction is wrong

32
More-Realistic Branch Prediction

• Static branch prediction

• Based on typical branch behavior
• Example: loop and if-statement branches
• Predict backward branches taken
• Predict forward branches not taken

• Dynamic branch prediction

• Hardware measures actual branch behavior
• e.g., record recent history of each branch

• Assume future behavior will continue the trend

• When wrong, stall while re-fetching, and update history

33
Control Hazard: Branch Hazards
If branch outcome determined in MEM

Flush these
instructions
(Set control
values to 0)

34
Data Hazards for Branches

If a comparison register is a destination of 2nd or 3rd preceding ALU instruction

add x1, x2, x3 IF ID EX MEM WB

add x4, x5, x6 IF ID EX MEM WB

… IF ID EX MEM WB

beq x1, x4, target IF ID EX MEM WB

◼ Can resolve using forwarding

35
Data Hazards for Branches

If a comparison register is a destination of preceding ALU instruction or 2nd preceding

load instruction
• Need 1 stall cycle

lw x1, addr IF ID EX MEM WB

add x4, $x5, $x6 IF ID EX MEM WB

beq stalled IF ID

beq x1, x4, target ID EX MEM WB

36
Data Hazards for Branches

If a comparison register is a destination of immediately preceding load instruction

• Need 2 stall cycles

lw x1, addr IF ID EX MEM WB

beq stalled IF ID

beq stalled ID

beq x1, x0, target ID EX MEM WB

37
Future Lecture
• Memory Technologies

Daily Math Review Sheets Grade 5 PDF
100% (2)
Daily Math Review Sheets Grade 5 PDF
77 pages
Pipeline Processing Explained
No ratings yet
Pipeline Processing Explained
47 pages
Management by Walking Around
100% (2)
Management by Walking Around
7 pages
NTA IGNOU PHD Entrance Exam Syllabus
No ratings yet
NTA IGNOU PHD Entrance Exam Syllabus
85 pages
NZ National Vital Signs Chart
No ratings yet
NZ National Vital Signs Chart
2 pages
Pipelining 2
No ratings yet
Pipelining 2
16 pages
Grade 9 - English All Unit 3 and Moments #3
No ratings yet
Grade 9 - English All Unit 3 and Moments #3
5 pages
Pipe Lining
No ratings yet
Pipe Lining
32 pages
M1000H
No ratings yet
M1000H
2 pages
MCQ
67% (3)
MCQ
274 pages
PipeLining in Microprocessors
No ratings yet
PipeLining in Microprocessors
19 pages
Benchmarking Sox Costs, Hours and Controls
No ratings yet
Benchmarking Sox Costs, Hours and Controls
45 pages
P&ID Symbols and Legend Guide
No ratings yet
P&ID Symbols and Legend Guide
1 page
Chemistry Isomerism Quiz
No ratings yet
Chemistry Isomerism Quiz
4 pages
Pipelining
No ratings yet
Pipelining
26 pages
Example: Pipelining: Basic and Intermediate Concepts
No ratings yet
Example: Pipelining: Basic and Intermediate Concepts
3 pages
COA Module 3 PPT Part 2
No ratings yet
COA Module 3 PPT Part 2
62 pages
Pipelining Basic and Intermediate Concepts
No ratings yet
Pipelining Basic and Intermediate Concepts
75 pages
05 Pipelining
No ratings yet
05 Pipelining
34 pages
Distance Learning Courses DLEN
No ratings yet
Distance Learning Courses DLEN
35 pages
Day Trading Capital Management Plan
No ratings yet
Day Trading Capital Management Plan
38 pages
Lecture Notes Pipelining Stages 7B
No ratings yet
Lecture Notes Pipelining Stages 7B
7 pages
Bản Sao Của Lecture 9 - Pipelined Processor Design
No ratings yet
Bản Sao Của Lecture 9 - Pipelined Processor Design
11 pages
Helping Slides Pipelining Hazards Solutions
No ratings yet
Helping Slides Pipelining Hazards Solutions
55 pages
Basic Concepts1
No ratings yet
Basic Concepts1
18 pages
Pipelining Concepts and Problems
No ratings yet
Pipelining Concepts and Problems
33 pages
Chapter # 03 Pipelining
No ratings yet
Chapter # 03 Pipelining
85 pages
07 Pipeline Notes
No ratings yet
07 Pipeline Notes
145 pages
Creatine Kinase: 7D63-20 and 7D63-30
No ratings yet
Creatine Kinase: 7D63-20 and 7D63-30
8 pages
Unit 6 Updated
No ratings yet
Unit 6 Updated
40 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
Changing Levels of Meaning and Experience - Steve Andreas
No ratings yet
Changing Levels of Meaning and Experience - Steve Andreas
5 pages
Unit 6 Updated
No ratings yet
Unit 6 Updated
40 pages
Pipe Lining
No ratings yet
Pipe Lining
61 pages
3-Pipelining 241110 203716
No ratings yet
3-Pipelining 241110 203716
59 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
Pipelining
No ratings yet
Pipelining
43 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
26 pages
Pipe Lining
No ratings yet
Pipe Lining
35 pages
اسمبلي ٩
No ratings yet
اسمبلي ٩
3 pages
Module 3-Part 2
No ratings yet
Module 3-Part 2
50 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
Module 4
No ratings yet
Module 4
12 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
Understanding Processor Pipelining
No ratings yet
Understanding Processor Pipelining
28 pages
Pipelined Processor Design: Computer Architecture and Assembly Language
No ratings yet
Pipelined Processor Design: Computer Architecture and Assembly Language
22 pages
Pipeline Processing
No ratings yet
Pipeline Processing
28 pages
Lecture-4-08 01 2025
No ratings yet
Lecture-4-08 01 2025
35 pages
Lecture 13 Pipelining
No ratings yet
Lecture 13 Pipelining
12 pages
Week 11 Reduced
No ratings yet
Week 11 Reduced
29 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
36 pages
Co Unit 4
No ratings yet
Co Unit 4
17 pages
Lec03-Pipelining 2021
No ratings yet
Lec03-Pipelining 2021
20 pages
Unit 6
No ratings yet
Unit 6
30 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Kien-Truc-May-Tinh-Nang-Cao - Tran-Ngoc-Thinh - Lec03-Pipelining - (Cuuduongthancong - Com)
No ratings yet
Kien-Truc-May-Tinh-Nang-Cao - Tran-Ngoc-Thinh - Lec03-Pipelining - (Cuuduongthancong - Com)
35 pages
Chapter 4.5 - 4.8 Piplined Processor and Hazards
No ratings yet
Chapter 4.5 - 4.8 Piplined Processor and Hazards
68 pages
Pipelining Unit 3
No ratings yet
Pipelining Unit 3
19 pages
The Importance of Corporate Communications During Financial Crisis
No ratings yet
The Importance of Corporate Communications During Financial Crisis
12 pages
3.4 Pipelining Performance2
No ratings yet
3.4 Pipelining Performance2
12 pages
Module 4 - Parallel & Pipeline Processing - Final
No ratings yet
Module 4 - Parallel & Pipeline Processing - Final
31 pages
Module 5 Part2 Pipelining
No ratings yet
Module 5 Part2 Pipelining
36 pages
Pipeline
No ratings yet
Pipeline
39 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
74 pages
Comparison Between Pipelining
No ratings yet
Comparison Between Pipelining
9 pages
Slide 6
No ratings yet
Slide 6
46 pages
Pipeline 1
No ratings yet
Pipeline 1
17 pages
LECTURE 3 Pipelining
No ratings yet
LECTURE 3 Pipelining
27 pages
5 Pipelining
No ratings yet
5 Pipelining
38 pages
Lecture 4
No ratings yet
Lecture 4
19 pages
Harley-Davidson Procurement Software Selection
No ratings yet
Harley-Davidson Procurement Software Selection
3 pages
Hyundai Engine HMC l4kb9 Shop Manual
100% (64)
Hyundai Engine HMC l4kb9 Shop Manual
10 pages
Unit 2 - Session-6 To 10
No ratings yet
Unit 2 - Session-6 To 10
40 pages
Lecture: Pipelining Basics
No ratings yet
Lecture: Pipelining Basics
28 pages
Szymanowski List of Compositions
No ratings yet
Szymanowski List of Compositions
12 pages
Lec 04 Pipeline D Processor
No ratings yet
Lec 04 Pipeline D Processor
106 pages
Unit3 Pipelining
No ratings yet
Unit3 Pipelining
54 pages
3.2-1 Aqa Particles and Radiation Notes
No ratings yet
3.2-1 Aqa Particles and Radiation Notes
29 pages
The Need of MEMS
No ratings yet
The Need of MEMS
22 pages
Adcps: Question Paper Cum Answer Sheet
No ratings yet
Adcps: Question Paper Cum Answer Sheet
5 pages
2nd Exam TQ
No ratings yet
2nd Exam TQ
23 pages
Company Profile PDF
No ratings yet
Company Profile PDF
38 pages
Biology of Stem Cells: An Overview: Pedro C. Chagastelles and Nance B. Nardi
No ratings yet
Biology of Stem Cells: An Overview: Pedro C. Chagastelles and Nance B. Nardi
5 pages
LESSON PLAN - 04-Graphing Linear Equations in Two Variables
No ratings yet
LESSON PLAN - 04-Graphing Linear Equations in Two Variables
6 pages
HR Synopsis
No ratings yet
HR Synopsis
11 pages
Saudi Arabia Technician Jobs Listings
No ratings yet
Saudi Arabia Technician Jobs Listings
6 pages
Anticipation Guide-Phonics and Word Recognition
No ratings yet
Anticipation Guide-Phonics and Word Recognition
5 pages
U.S.S. Europa Starship Specs
No ratings yet
U.S.S. Europa Starship Specs
1 page
Ictasol
No ratings yet
Ictasol
1 page

Computer Systems Pipelining Guide

Uploaded by

Computer Systems Pipelining Guide

Uploaded by

Department of Systems and Engineering

Design, Carleton University

SYSC 3320 Computer Systems Design

• Three factors to improve CPU performance

• This is an overly simplified sequence. Real processors have much more

Stage 1 Stage 2 Stage 3

• Pipelining optimizes this process by running stages “in parallel” so instruction

Cycles per instruction (CPI) = 5

Cycles per instruction (CPI) = 1

Instructions still take the same clock cycles but the

ID: Instruction Decode

Speedup = Execution time without pipelining/Execution time

intermediate storage between stages

MUX Data and

MUX Data and

ld x1, 0(x0) ld x1, 0(x0)

Second Add First Add

Add R1, R0, #20 R1 <- R0 + 20

Add R1, R0, #20 R1 <- R0 + 20

Add R1, R0, #20 R1 <- R0 + 20

Stalls reduce performance

Compiler can arrange code to avoid hazards and stalls

Wait until branch outcome determined before fetching next instruction

• Longer pipelines can’t readily determine branch outcome early

• Predict outcome of branch

• Static branch prediction

• Dynamic branch prediction

• Assume future behavior will continue the trend

If a comparison register is a destination of 2nd or 3rd preceding ALU instruction

add x1, x2, x3 IF ID EX MEM WB

add x4, x5, x6 IF ID EX MEM WB

beq x1, x4, target IF ID EX MEM WB

◼ Can resolve using forwarding

If a comparison register is a destination of preceding ALU instruction or 2nd preceding

lw x1, addr IF ID EX MEM WB

add x4, $x5, $x6 IF ID EX MEM WB

beq x1, x4, target ID EX MEM WB

If a comparison register is a destination of immediately preceding load instruction

lw x1, addr IF ID EX MEM WB

beq x1, x0, target ID EX MEM WB

You might also like