PIPELINING: HAZARDS
Mahdi Nazm Bojnordi
Assistant Professor
School of Computing
University of Utah
CS/ECE 6810: Computer Architecture
Overview
¨ Announcement
¤ Homework 1 submission deadline: Jan. 30th
¨ This lecture
¤ Impacts of pipelining on performance
¤ The MIPS five-stage pipeline
¤ Pipeline hazards
n Structural
hazards
n Data hazards
Pipelining Technique
¨ Improving throughput at the expense of latency
¤ Delay:D = T + nδ
¤ Throughput: IPS = n/(T + nδ)
Combinational Logic D=
Critical Path Delay = 30 IPS =
Combinational Logic Combinational Logic D=
Critical Path Delay = 15 Critical Path Delay = 15 IPS =
Comb. Logic Comb. Logic Comb. Logic D=
Delay = 10 Delay = 10 Delay = 10 IPS =
Pipelining Technique
¨ Improving throughput at the expense of latency
¤ Delay:D = T + nδ
¤ Throughput: IPS = n/(T + nδ)
Combinational Logic D = 31
Critical Path Delay = 30 IPS = 1/31
Combinational Logic Combinational Logic D = 32
Critical Path Delay = 15 Critical Path Delay = 15 IPS = 2/32
Comb. Logic Comb. Logic Comb. Logic D = 33
Delay = 10 Delay = 10 Delay = 10 IPS = 3/33
Pipelining Latency vs. Throughput
¨ Theoretical delay and throughput models for
perfect pipelining
Delay (D) Throughput (IPS)
20
Relative Performance
15
10
5
0
0 50 100 150 200
Number of Pipeline Stages
Five Stage MIPS Pipeline
Simple Five Stage Pipeline
¨ A pipelined load-store architecture that processes
up to one instruction per cycle
Write Back
PC
Inst. Register Data
ALU
Memory File Memory
Inst. Fetch Inst. Decode Execute Memory
Instruction Fetch
¨ Read an instruction from memory (I-Cache)
¤ Usethe program counter (PC) to index into the I-
Memory
¤ Compute NPC by incrementing current PC
n What about branches?
¨ Update pipeline registers
¤ Write the instruction into the pipeline registers
Instruction Fetch
clock
Branch Target
NPC = PC + 4
NPC
clock PC +
4 Why increment
by 4?
Instruction
Memory
Pipeline
Register
Instruction Fetch
clock
P3
Branch Target
NPC = PC + 4
NPC
clock PC +
P2
4 Why increment
by 4?
Instruction
P1
Memory
Critical Path = Max{P1, P2, P3} Pipeline
Register
Instruction Decode
¨ Generate control signals for the opcode bits
¨ Read source operands from the register file (RF)
¤ Use the specifiers for indexing RF
n How many read ports are required?
¨ Update pipeline registers
¤ Send the operand and immediate values to next stage
¤ Pass control signals and NPC to next stage
Instruction Decode
NPC target
NPC
reg
Register
Instruction
File
reg
ctrl
decode
Pipeline Pipeline
Register Register
Execute Stage
¨ Perform ALU operation
¤ Compute the result of ALU
n Operation type: control signals
n First operand: contents of a register
n Second operand: either a register or the immediate value
¤ Compute branch target
n Target = NPC + immediate
¨ Update pipeline registers
¤ Control signals, branch target, ALU results, and
destination
Execute Stage
Target
NPC
Res
reg
ALU
reg
reg
ctrl
ctrl
Pipeline Pipeline
Register Register
Memory Access
¨ Access data memory
¤ Load/store address: ALU outcome
¤ Control signals determine read or write access
¨ Update pipeline registers
¤ ALU results from execute
¤ Loaded data from D-Memory
¤ Destination register
Memory Access
Target
Res
Res
addr
Dat
reg
Memory
data data
ctrl
ctrl
Pipeline Pipeline
Register Register
Register Write Back
¨ Update register file
¤ Controlsignals determine if a register write is needed
¤ Only one write port is required
n Write the ALU result to the destination register, or
n Write the loaded data into the register file
Five Stage Pipeline
¨ Ideal pipeline: IPC=1
¤ Isthere enough resources to keep the pipeline stages
busy all the time?
Inst. Fetch Decode Execute Memory Writeback
+
PC +
Reg. ALU Reg.
4
File Mem File
Mem
Pipeline Hazards
Pipeline Hazards
¨ Structural hazards: multiple instructions compete for
the same resource
¨ Data hazards: a dependent instruction cannot
proceed because it needs a value that hasn’t been
produced
¨ Control hazards: the next instruction cannot be
fetched because the outcome of an earlier branch is
unknown
Structural Hazards
¨ 1. Unified memory for instruction and data
R1ß Mem[R2]
R3ß Mem[R20]
R6ß R4-R5
R7ß R1+R0
Structural Hazards
¨ 1. Unified memory for instruction and data
R1ß Mem[R2]
R3ß Mem[R20]
R6ß R4-R5
R7ß R1+R0
Separate inst. and data memories.
Structural Hazards
¨ 1. Unified memory for instruction and data
¨ 2. Register file with shared read/write access ports
R1ß Mem[R2]
R3ß Mem[R20]
R6ß R4-R5
R7ß R1+R0
Structural Hazards
¨ 1. Unified memory for instruction and data
¨ 2. Register file with shared read/write access ports
R1ß Mem[R2]
R3ß Mem[R20]
R6ß R4-R5
R7ß R1+R0
Register access in half cycles.
Data Hazards
¨ True dependence: read-after-write (RAW)
¤ Consumer has to wait for producer
Loading data from memory.
R1ß Mem[R2]
R3ß R1+R0
R4ß R1-R3
Data Hazards
¨ True dependence: read-after-write (RAW)
¤ Consumer has to wait for producer
Loaded data will be available two cycles later.
R1ß Mem[R2]
R3ß R1+R0
R4ß R1-R3
Data Hazards
¨ True dependence: read-after-write (RAW)
¤ Consumer has to wait for producer
Inserting two bubbles.
R1ß Mem[R2]
Nothing
Nothing
R3ß R1+R0
R4ß R1-R3
Data Hazards
¨ True dependence: read-after-write (RAW)
¤ Consumer has to wait for producer
Inserting single bubble + RF bypassing.
R1ß Mem[R2]
Nothing
R3ß R1+R0
R4ß R1-R3
Load delay slot.
SW vs. HW management?
Data Hazards
¨ True dependence: read-after-write (RAW)
¤ Consumer has to wait for producer
Using the result of an ALU instruction.
R1ß R2+R3
R5ß R1+R0
R3ß R1+R0
R4ß R1-R3
Data Hazards
¨ True dependence: read-after-write (RAW)
¤ Consumer has to wait for producer
Using the result of an ALU instruction.
R1ß R2+R3
R5ß R1+R0
R3ß R1+R0
R4ß R1-R3
Forwarding ALU result.
Data Hazards
¨ True dependence: read-after-write (RAW)
¨ Anti dependence: write-after-read (WAR)
¤ Write must wait for earlier read
R1ß R2+R1
R2ß R8+R9
Data Hazards
¨ True dependence: read-after-write (RAW)
¨ Anti dependence: write-after-read (WAR)
¤ Write must wait for earlier read
R1ß R2+R1
R2ß R8+R9
No WAR hazards in 5-stage pipeline!
Data Hazards
¨ True dependence: read-after-write (RAW)
¨ Anti dependence: write-after-read (WAR)
¨ Output dependence: write-after-write (WAW)
¤ Old writes must not overwrite the younger write
R1ß R2+R3
R1ß R8+R9
Data Hazards
¨ True dependence: read-after-write (RAW)
¨ Anti dependence: write-after-read (WAR)
¨ Output dependence: write-after-write (WAW)
¤ Old writes must not overwrite the younger write
R1ß R2+R3
R1ß R8+R9
No WAW hazards in 5-stage pipeline!
Data Hazards
¨ Forwarding with additional hardware
Data Hazards
¨ How to detect and resolve data hazards
¤ Show all of the data hazards in the code below
R1ß Mem[R2]
R2ß R1+R0
R1ß R1-R2
Mem[R3] ß R2
Data Hazards
¨ How to detect and resolve data hazards
¤ Show all of the data hazards in the code below
R1ß Mem[R2]
WAR
WAW R2ß R1+R0
R1ß R1-R2 RAW
Mem[R3] ß R2