0% found this document useful (0 votes)

19 views42 pages

ch4 2

The document discusses the concept of pipelining in computer architecture, using a laundry analogy to illustrate overlapping execution and performance improvement. It details the LEGv8 pipeline structure, stages, performance comparisons, and the impact of hazards such as structure, data, and control on instruction execution. Additionally, it covers techniques for mitigating hazards, including code scheduling and branch prediction, while emphasizing the importance of instruction set architecture design for efficient pipelining.

Uploaded by

macbay prince

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views42 pages

ch4 2

Uploaded by

macbay prince

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 42

Pipelining Analogy

•
Pipelined laundry: overlapping execution
–
Parallelism improves performance


Four loads:

Speedup
= 8/3.5 = 2.3

Non-stop:

Speedup
= 2n/0.5n + 1.5 ≈
4
= number of
stages
FIGURE 4.24 The laundry analogy for pipelining. Ann, Brian, Cathy,
and Don each have dirty clothes to be washed, dried, folded, and
put away. The washer, dryer, “folder,” and “storer” each take 30
minutes for their task.
LEGv8 Pipeline
Five stages, one step per stage
1. IF: Instruction fetch from memory
2. ID: Instruction decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access memory operand
5. WB: Write result back to register
Example: Pipeline Performance
•
Assume time for stages is
–
100ps for register read or write
–
200ps for other stages
•
Compare pipelined datapath with single-
cycle datapath

Instr Instr fetch Register ALU op Memory Register Total

read access write time
LDUR 200ps 100 ps 200ps 200ps 100 ps 800ps
STUR 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
CBZ 200ps 100 ps 200ps 500ps

Figure 4.25
Pipeline Performance
Single-cycle (Tc=
800ps)

Pipelined (Tc=
200ps)

Figure 4.26
Pipeline Speedup
• If all stages are balanced
– i.e., all take the same time

• If not balanced, speedup is less

• Speedup due to increased throughput
– Latency (time for each instruction) does not decrease
Pipelining and ISA Design
• LEGv8 ISA designed for pipelining
– All instructions are 32-bits
•
Easier to fetch and decode in one cycle
•
c.f. x86: 1- to 17-byte instructions
– Few and regular instruction formats
•
Can decode and read registers in one step
– Load/store addressing
•
Can calculate address in 3rd stage, access memory in 4th
stage
– Alignment of memory operands
•
Memory access takes only one cycle
Hazards
Situations that prevent starting the next
instruction in the next cycle
• Structure hazards
– A required resource is busy
• Data hazard
– Need to wait for previous instruction to complete its
data read/write
• Control hazard
– Deciding on control action depends on previous
instruction
Structure Hazards
• Conflict for use of a resource
• In LEGv8 pipeline with a single memory
– Load/store requires data access
– Instruction fetch would have to stall for that cycle
•
Would cause a pipeline “bubble”
• Hence, pipelined datapaths require
separate instruction/data memories
– Or separate instruction/data caches
Data Hazards
•
An instruction depends on completion of
data access by a previous instruction

– ADDX19, X0,X1
SUBX2, X19,X3
Data Hazards
•
Use result when it is computed
–
Don’t wait for it to be stored in a register
–
Requires extra connections in the datapath

FIGURE 4.28 Graphical representation of forwarding.

Load-Use Data Hazard
•
Can’t always avoid stalls by forwarding
–
If value not computed when needed
–
Can’t forward backward in time!

FIGURE 4.29 We need a stall even with forwarding when an R-format

instruction following a load tries to use the data
Code Scheduling to Avoid Stalls
•
Reorder code to avoid use of load result in
the next instruction
•
C code for A = B + E; C = B + F;

LDUR X1, [X0,#0] LDUR X1, [X0,#0]

LDUR X2, [X0,#8] LDUR X2, [X0,#8]
stall ADD X3, X1, X2 LDUR X4, [X0,#16]
STUR X3, [X0,#24] ADD X3, X1, X2
LDUR X4, [X0,#16] STUR X3, [X0,#24]
stall ADD X5, X1, X4 ADD X5, X1, X4
STUR X5, [X0,#32] STUR X5, [X0,#32]
13 11
cycles cycles
Control Hazards
•
Branch determines flow of control
–
Fetching next instruction depends on branch
outcome
–
Pipeline can’t always fetch correct instruction
•
Still working on ID stage of branch

•
In LEGv8 pipeline
–
Need to compare registers and compute
target early in the pipeline
–
Add hardware to do it in ID stage
Stall on Branch
•
Wait until branch outcome determined before
fetching next instruction

FIGURE 4.30 Pipeline showing stalling on every conditional branch as solution to control
hazards. This example assumes the conditional branch is taken, and the instruction at the
destination of the branch is the ORR instruction. There is a one-stage pipeline stall, or bubble,
after the branch. In reality, the process of creating a stall is slightly more complicated. The effect
on performance, however, is the same as would occur if a bubble were inserted
Branch Prediction
• Longer pipelines can’t readily
determine branch outcome early
– Stall penalty becomes unacceptable
• Predict outcome of branch
– Only stall if prediction is wrong
• In LEGv8 pipeline
– Can predict branches not taken
– Fetch instruction after branch, with no delay
More-Realistic Branch Prediction
• Static branch prediction
– Based on typical branch behavior
– Example: loop and if-statement branches
•
Predict backward branches taken
•
Predict forward branches not taken
• Dynamic branch prediction
– Hardware measures actual branch behavior
•
e.g., record recent history of each branch
– Assume future behavior will continue the trend
•
When wrong, stall while re-fetching, and update history
Pipeline Summary
The BIG Picture
•
Pipelining improves performance by
increasing instruction throughput
–
Executes multiple instructions in parallel
–
Each instruction has the same latency
•
Subject to hazards
–
Structure, data, control
•
Instruction set design affects complexity
of pipeline implementation
LEGv8 Pipelined Datapath

MEM

Right-to-left WB
flow leads to
hazards

FIGURE 4.32 The single-cycle datapath from Section 4.4 (similar to Figure 4.17)
Pipeline registers
•
Need registers between stages
–
To hold information produced in previous cycle

FIGURE 4.34 The pipelined version of the datapath in Figure 4.32.

LOAD

FIGURE 4.35
IF and ID:
First and second
pipe stages of an
instruction, with
the active
portions of the
datapath in Figure
4.34 highlighted
FIGURE 4.36 EX: The third pipe stage of a load instruction, highlighting the
portions of the datapath in Figure 4.34 used in this pipe stage. The register is
added to the sign-extended immediate, and the sum is placed in the EX/MEM
pipeline register.
FIGURE 4.37

MEM and WB: The

fourth and fifth pipe
stages of a load
instruction,
highlighting the
portions of the
datapath in Figure
4.34 used in this
pipe stage
Corrected Datapath for Load
Store

FIGURE 4.38 EX: The third pipe stage of a store instruction. Unlike the third stage
of the load instruction in Figure 4.36, the second register value is loaded into the
EX/MEM pipeline register to be used in the next stage. Although it wouldn’t hurt to
always write this second register into the EX/MEM pipeline register, we write the
second register only on a store instruction to make the pipeline easier to
understand.
FIGURE 4.39
MEM and WB:
The fourth and fifth pipe
stages of a store
instruction.

In the fourth stage, the

data are written into
data Memory for the
store. Note that the
data come from the
EX/MEM pipeline
register and that
nothing is changed in
the MEM/WB pipeline
register. Once the data
are written in memory,
there is nothing left for
the store instruction to
do, so nothing happens
in stage 5.
FIGURE 4.40 The corrected pipelined datapath to handle the load instruction
properly. The write register number now comes from the MEM/WB pipeline
register along with the data. The register number is passed from the ID pipe
stage until it reaches the MEM/WB pipeline register, adding five more bits to
the last three pipeline registers. This new path is shown in color.
FIGURE 4.41 The portion of the datapath in Figure 4.40 that is used in all five
stages of a load instruction.
Graphically Representing Pipelines
multiple-clock-cycle pipeline
diagrams single-clock-cycle
pipeline diagrams

Consider the following five-instruction

sequence: LDUR X10, [X1,#40]

SUB X11, X2, X3
ADD X12, X3, X4
LDUR X13,
[X1,#48] ADD
X14, X5, X6

Figure 4.42 shows the multiple-clock-cycle

pipeline diagram for these instructions.
Multi-Cycle Pipeline Diagram
•
Form showing resource usage

FIGURE 4.42 Multiple-clock-cycle pipeline diagram of five instructions

Multi-Cycle Pipeline Diagram
•
Traditional form

FIGURE 4.43 Traditional multiple-clock-cycle pipeline diagram of five instructions in

Figure 4.42.
Single-Cycle Pipeline Diagram
•
State of pipeline in a given cycle

FIGURE 4.44 The single-clock-cycle diagram corresponding to clock cycle 5 of the

pipeline in Figures 4.42 and 4.43.
Pipelined Control (Simplified)

FIGURE 4.45 The pipelined datapath of

Figure 4.40 with the control signals
identified
Pipelined Control
•
Control signals derived from instruction
– As in single-cycle implementation

FIGURE 4.49 The eight control lines for the final three stages.
Pipelined Control

FIGURE 4.50 The

pipelined datapath
of Figure 4.45, with
the control signals
connected to the
control portions of
the pipeline
registers.

Pipelining Basic Concept
No ratings yet
Pipelining Basic Concept
23 pages
Network Tweaks
No ratings yet
Network Tweaks
30 pages
Lecture # Pipelining and Datahazards
No ratings yet
Lecture # Pipelining and Datahazards
70 pages
L11 Pipelined Datapath and
100% (1)
L11 Pipelined Datapath and
31 pages
Dell Dimension 3100 - E310 Specifications
100% (2)
Dell Dimension 3100 - E310 Specifications
4 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
WPC Practical No. 12
No ratings yet
WPC Practical No. 12
3 pages
Etroa01 Etrob02 Etaic04 Etvod08 Etioe16 Modbus Rtu Command
No ratings yet
Etroa01 Etrob02 Etaic04 Etvod08 Etioe16 Modbus Rtu Command
17 pages
ch2 1
No ratings yet
ch2 1
54 pages
Computer Software Components
No ratings yet
Computer Software Components
16 pages
Module 4-Pipelining
No ratings yet
Module 4-Pipelining
39 pages
Chapter 04 RISC V Removed
No ratings yet
Chapter 04 RISC V Removed
99 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
31 pages
Chapter 6
No ratings yet
Chapter 6
43 pages
Bản Sao Của Lecture 9 - Pipelined Processor Design
No ratings yet
Bản Sao Của Lecture 9 - Pipelined Processor Design
11 pages
Week 4 - Pipelining
No ratings yet
Week 4 - Pipelining
44 pages
Pipelining & Vector Processing Guide
No ratings yet
Pipelining & Vector Processing Guide
29 pages
Lec 3
No ratings yet
Lec 3
30 pages
Lecture 13 Pipelining
No ratings yet
Lecture 13 Pipelining
12 pages
SRM Pipelining 05
No ratings yet
SRM Pipelining 05
42 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Pipelined Datapath and Control
No ratings yet
Pipelined Datapath and Control
37 pages
Chapter 10 Principles of Pipelining
No ratings yet
Chapter 10 Principles of Pipelining
124 pages
Crash
No ratings yet
Crash
90 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
CEA201 - Chapter 14 - Processor Structure and Function
No ratings yet
CEA201 - Chapter 14 - Processor Structure and Function
42 pages
QuickLearner ProblemSolvingProgramDesign
No ratings yet
QuickLearner ProblemSolvingProgramDesign
107 pages
1 Processor Pipeline
No ratings yet
1 Processor Pipeline
73 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
3.2 Pipeline Processing
No ratings yet
3.2 Pipeline Processing
18 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
Lec18 Pipeline Chap9 2
No ratings yet
Lec18 Pipeline Chap9 2
26 pages
Parallel Processing & Pipelining
No ratings yet
Parallel Processing & Pipelining
33 pages
Pipeline Processing Explained
No ratings yet
Pipeline Processing Explained
5 pages
CA Classes-76-80
No ratings yet
CA Classes-76-80
5 pages
3 Pipeline
No ratings yet
3 Pipeline
38 pages
Pipeline Processor Design
No ratings yet
Pipeline Processor Design
89 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
This Study Resource Was: Pipelining Analogy
No ratings yet
This Study Resource Was: Pipelining Analogy
58 pages
Ca07 2014 PDF
No ratings yet
Ca07 2014 PDF
56 pages
L15 MipsPipeline
No ratings yet
L15 MipsPipeline
26 pages
Chapter 4 Part 2
No ratings yet
Chapter 4 Part 2
50 pages
Comp206 Lecture8
No ratings yet
Comp206 Lecture8
32 pages
Pipelining
No ratings yet
Pipelining
32 pages
CA Unit-2 Chapter-2
No ratings yet
CA Unit-2 Chapter-2
36 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
71 pages
L14 MipsPipeline Ovw
No ratings yet
L14 MipsPipeline Ovw
17 pages
Week 11
No ratings yet
Week 11
33 pages
Pipeline & Parallel Processing
No ratings yet
Pipeline & Parallel Processing
19 pages
Advanced Linux Programming
No ratings yet
Advanced Linux Programming
31 pages
Linux Basic To Advance
No ratings yet
Linux Basic To Advance
189 pages
Pipelining. Pipeline Hazards: Sabina Batyrkhanovna
No ratings yet
Pipelining. Pipeline Hazards: Sabina Batyrkhanovna
19 pages
Pipelining and Parallelism
No ratings yet
Pipelining and Parallelism
41 pages
CH 12.ppt Type I
No ratings yet
CH 12.ppt Type I
54 pages
GX Configurator QP Usermanual
No ratings yet
GX Configurator QP Usermanual
180 pages
Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
Co - Unit Ii - Ii
No ratings yet
Co - Unit Ii - Ii
34 pages
Lec11 Pipeline 1 Notes
No ratings yet
Lec11 Pipeline 1 Notes
26 pages
MIPS Pipeline Stages & Hazards
No ratings yet
MIPS Pipeline Stages & Hazards
84 pages
The Hour Between Dog and Wolf
No ratings yet
The Hour Between Dog and Wolf
131 pages
Advanced Pipelining Techniques
No ratings yet
Advanced Pipelining Techniques
44 pages
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
No ratings yet
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
81 pages
Advanced Pipelining Techniques
No ratings yet
Advanced Pipelining Techniques
75 pages
CODch 6 Slides
No ratings yet
CODch 6 Slides
77 pages
Module1 UPD
No ratings yet
Module1 UPD
72 pages
IT Infrastructure
No ratings yet
IT Infrastructure
8 pages
10961C Automating Administration With Windows PowerShell
No ratings yet
10961C Automating Administration With Windows PowerShell
12 pages
The Efficient Implementation of An Array Multiplier
No ratings yet
The Efficient Implementation of An Array Multiplier
5 pages
CS 3210 Operating System Design Fall 2003, MW 12-1
No ratings yet
CS 3210 Operating System Design Fall 2003, MW 12-1
6 pages
OVHcloud - Summarized Deck
100% (1)
OVHcloud - Summarized Deck
16 pages
HV335T Firmware Update EN v1.0 PDF
100% (1)
HV335T Firmware Update EN v1.0 PDF
1 page
ALL100 Manual
No ratings yet
ALL100 Manual
95 pages
ch2 2
No ratings yet
ch2 2
36 pages
Cis 1500: Assignment 2: Weighing: 15% Due: Friday, October 17, 2014, 11:55 PM
No ratings yet
Cis 1500: Assignment 2: Weighing: 15% Due: Friday, October 17, 2014, 11:55 PM
4 pages
ch5 1
No ratings yet
ch5 1
44 pages
BricsCAD ACCESS_VIOLATION Error Log
No ratings yet
BricsCAD ACCESS_VIOLATION Error Log
56 pages
Fundamentals of Blockchains
No ratings yet
Fundamentals of Blockchains
41 pages
TT LAB FILE - Ansh Saxena - 19SCSE1010502
No ratings yet
TT LAB FILE - Ansh Saxena - 19SCSE1010502
62 pages
Shyam Sunder Goyal SAP ABAP On HANA Consultant
No ratings yet
Shyam Sunder Goyal SAP ABAP On HANA Consultant
4 pages
C005 - PEMaster Install Manual PDF
No ratings yet
C005 - PEMaster Install Manual PDF
19 pages
Perancangan Sistem Informasi Logistik Dan Basis Data: ILI-3F3
No ratings yet
Perancangan Sistem Informasi Logistik Dan Basis Data: ILI-3F3
33 pages
Device Specs for Tech Enthusiasts
No ratings yet
Device Specs for Tech Enthusiasts
9 pages
Boolean Logic & Number Systems Assignment
No ratings yet
Boolean Logic & Number Systems Assignment
3 pages
Jetbot-Software Setup
No ratings yet
Jetbot-Software Setup
11 pages
CENG400-Final-Fall 2015
No ratings yet
CENG400-Final-Fall 2015
10 pages
Assembly&C Language
No ratings yet
Assembly&C Language
6 pages
Advanced IP Camera Specs
No ratings yet
Advanced IP Camera Specs
3 pages
HotCloneVm 0.8
No ratings yet
HotCloneVm 0.8
6 pages
Yearly C For Class 7
No ratings yet
Yearly C For Class 7
4 pages
20 A77 01 STOP - Datasheet - Web
No ratings yet
20 A77 01 STOP - Datasheet - Web
2 pages

ch4 2

Uploaded by

ch4 2

Uploaded by

Pipelining Analogy

Instr Instr fetch Register ALU op Memory Register Total

• If not balanced, speedup is less

FIGURE 4.28 Graphical representation of forwarding.

FIGURE 4.29 We need a stall even with forwarding when an R-format

LDUR X1, [X0,#0] LDUR X1, [X0,#0]

FIGURE 4.34 The pipelined version of the datapath in Figure 4.32.

MEM and WB: The

In the fourth stage, the

Consider the following five-instruction

sequence: LDUR X10, [X1,#40]

Figure 4.42 shows the multiple-clock-cycle

FIGURE 4.42 Multiple-clock-cycle pipeline diagram of five instructions

FIGURE 4.43 Traditional multiple-clock-cycle pipeline diagram of five instructions in

FIGURE 4.44 The single-clock-cycle diagram corresponding to clock cycle 5 of the

FIGURE 4.45 The pipelined datapath of

FIGURE 4.50 The

You might also like