Instruction
Pipeline
COMPUTER ORGANIZATION
Copyright © 2014-2021 Testbook Edu Solutions Pvt. Ltd.: All rights reserved
Download Testbook
Instruction Pipeline
Pipelining
Mechanism for overlapping execution of many input sets by dividing one computation stage into many (Let k)
computation sub-stages.
Cost of implementation increases slightly.
Speed up increases
Working of Pipeline
S1 must happen before S2, S3 and S2 must happen before S3 (sequential execution)
T/3 T/3 T/3 T/3 T/3
S1 Item 1 Item 2 Item 3
S2 Item 1 Item 2 Item 3
S3 Item 1 Item 2 Item 3
Note: When Item 1 is in S2 stage, S1 will be empty so we can use S1 for Item 2 that time and parallel
execution of Item 1 in stage 2 can also happen.
In the processor pipeline we need a latch between successive stages to hold the intermediate results
temporarily.
COMPUTER ORGANIZATION | Instruction Pipeline PAGE 2
Download Testbook
Pipelined Processors
a. Degree of Overlap:
Serial: Next operation starts only after the previous operation gets completed.
Overlapped: Some overlap between consecutive stages.
Pipelined: Compute overlap between successive stages.
b. Depth of Pipeline:
Performance of the pipeline depends on the number of stages and how they are utilized without conflict.
Shallow pipeline has fewer number of stages.
Stages more complex:
Deep pipeline has larger number of stages
Stages are simpler.
COMPUTER ORGANIZATION | Instruction Pipeline PAGE 3
Download Testbook
c. Scheduling alternatives:
Static Pipeline:
i. Same sequence of pipeline stages is executed for all data/instructions.
ii. If one instruction stalls, all subsequent ones also get delayed.
Dynamic Pipeline:
i. Can be reconfigured to perform variable functions at different times.
ii. Feed forward and feedback b/w stages.
Speedup and Efficiency
τ: Clock period of the pipeline
ti: Time delay of circuit in stage Si
d2: delay of a catch.
Maximum stage delay, τm = max{ti}
τ = τm + dL
Pipeline frequency, 1/c
Speed up for K-stage pipeline with n-inputs:
COMPUTER ORGANIZATION | Instruction Pipeline PAGE 4
Download Testbook
Latency
The number of time units between two inputs initialization of a pipeline is called the latency between
them.
When two or more inputs attempt to use the same pipeline stage at same time, it will cause collision.
Latencies, after using which cause collisions, are called forbidden latencies.
Pipelining MIPS32 Data Path
Assumptions:
Each of 5 steps: If, ID, EX, MEM and WB let them as pipeline stages.
Each stage, let must finish its execution within one clock cycle.
Since many instructions will be overlapped, we must ensure that there is no conflict.
We can achieve these assumptions easily.
Let each stage take ‘T’ time units.
Time to execute n instructions = 5 * T * n
In pipelined:
Time to execute n instructions = 5(T + Δ) + (n - 1) (T + Δ)
= (4 + n) (T + Δ)
= (4 + n)T, if T >>Δ
≃5 if n is very large.
COMPUTER ORGANIZATION | Instruction Pipeline PAGE 5
Download Testbook
Conflict Stages
IF and MEM: Both these stages access memory. So, they should not be in the same cycle.
SOLUTION: Using separate instruction and data cache. (i-cache and d-cache)
ID and WB: Both these stages access register banks. So, they should not be used in the same stock cycle.
SOLUTION: Allow both read and write access to registers in the same clock cycle.
Simultaneous read and write may result in caches.
SOLUTION: Write in the 1st half of the cycle and read in the 2nd half of the cycle.
Points to Remember
1. Since, in a pipelined processor we have to fetch an instruction every clock cycle. Hence, we need to incre-
ment the program counter at the fetch stage itself. Otherwise, the next instruction will not be fetched.
2. In a non-pipelined processor there is no need to fetch an instruction every clock cycle. So, we increment
the program counter in the MEM stage.
Basic Performance Issue in Pipeline
Register stages are inserted with each pipeline stage, which increases overall execution time of first
(single) instruction.
COMPUTER ORGANIZATION | Instruction Pipeline PAGE 6
Download Testbook
Pipeline Hazards
An instruction pipeline should complete the execution of an instruction every clock cycle.
Hazards are situations which prevents this from happening (for some instructions)
Hazards
1. Structural Hazards (Resource conflicts)
2. Data Hazards (Data Dependencies)
3. Control Hazard (Branch and relocation change in program counter)
Solution for Hazards
Using special hardware and control circuits.
Inserting stall cycles in pipeline
When one instruction is stalled, all others that follow that instruction will also get stalled.
No new instruction can be fetched during the duration of stall.
Hazards result in performance degradation.
Structural Hazards
Due to resource conflicts.
When hardware cannot support overlapped execution.
Example: single memory cache to store instruction and data.
COMPUTER ORGANIZATION | Instruction Pipeline PAGE 7
Download Testbook
Eliminating Structural Hazards
To reduce the cost of implementation.
Pipelining all the functional units may be too costly.
If structural hazards are not frequent but them happen.
Make use of operating I & D cache.
COMPUTER ORGANIZATION | Instruction Pipeline PAGE 8
Download Testbook
Data Hazards
Data hazards occur due to data dependencies between instructions.
I1: ADD R2, R5, R8 I2: SUB R2, R2, R6.
Basic Solution: It inserts stall cycles → 3 clocks will be wasted
To reduce number of clock cycles
a) Data forwarding/bypassing: As soon as data is computed it will be forwarded using some additional
hardware consisting of multiplexers, without waiting for data to be written back.
b) Concurrent Register Access: By splitting a clock cycle into two halves.
First half: Register read
Second half: Register write.
Bypassing
The result computed by the previous instruction is stored in some register within the data path.
Take the value directly from the register and forward to instruction required.
Register Read/Write
To reduce the number of instructions to be forwarded.
COMPUTER ORGANIZATION | Instruction Pipeline PAGE 9
Download Testbook
We can avoid conflict which is occurring in some cycle i.e. WB and ID in the same cycle by using Register
Read/Write Scheme.
In first half Register Write (in WB)
In the second half cycle Register Read (in ID).
Data Hazard while Accessing Memory
Memory references are always in order, and so data hazards between memory references never occur.
Cache miss can result in pipeline stalls.
Load instruction followed by the used of the loaded data:
COMPUTER ORGANIZATION | Instruction Pipeline PAGE 10
Download Testbook
How to solve this problem?
Cannot be eliminated using forwarding
Pipeline Interlock: In hardware detects the hazard and stalls the pipeline until the hazard is cleared.
One stall cycle is needed.
Instruction Issue
Before Ex stage, in ID stage we will decode the instruction, in a typical ALU instruction.
When we are moving from ID to EX stage it means we are starting to execute the operation. It is when we
issue the instruction.
All possible data hazards can be checked in the ID stage itself.
If a data hazard exists, the instruction is stalled before it is issued.
Instruction Scheduling or Pipe Scheduling
Compiler tries to avoid generation code with a :
MIPS 32 Code
LW R1, a
LW R2, b
SUB R8, R1, R2 ← Interlock
SW R8, x
LW R1, c
LW R2, d
ADD Ra, R1, R2 ← Interlock
SW R9, y
COMPUTER ORGANIZATION | Instruction Pipeline PAGE 11
Download Testbook
Schedule By Compiler MIPS 32 Code:
LW R1, a
LW R2, b
LW R3, c
SUM R8, R1, R2 Both interlocks eliminated
LW R4, d
SW R8, x
ADD R9, R3, R4
SW R9, y
Pipeline Scheduling can increase number of registers required but result in performance improvement
Load instruction requires that the next instruction should not use the currently loaded value which is
delayed load.
If the compiler cannot move some instruction to fill up the delay slot, it can insert a NOP (No operation)
instruction.
Types of Data Hazards
a) Read After Write (RAW):
Consider two instructions i1 and i2, with i1 occurring before P2 in the program.
i2 tries to read a source before i1 writes to it
Situation where an instruction refers to a result that has not yet been calculated.
Example:
i1: R2 ← R5 + R3
i2: R4 ← R2 + R3
b) Write After Read: (WAR)
i2 tries to write a destination before it is read by i1.
COMPUTER ORGANIZATION | Instruction Pipeline PAGE 12
Download Testbook
Problem with concurrent execution.
Example:
i1: R4 ← R1 + R5
i2: R5 ← R1 + R2
c) Write After Write (WAW):
i2 tries to write an operand before it is written by i1.
Example:
i1: R2 ← R4 + R7
i2: R2 ← R1 + R3
Control Hazard
Arise because of change in flow of control or branch instructions.
Greater performance loss can be generated that data hazards.
If the branch is taken the PC is normally net updated until the end of MEM.
The next instruction can be fetched only after that (3 stall cycles)
We will redo the fetch again & again if it is a branch.
COMPUTER ORGANIZATION | Instruction Pipeline PAGE 13
Download Testbook
To Reduce Branch Stall Penalty
In MIPS 32, the branches require testing a register for zero, or comparing the values of two registers.
Using these registers by comparison logic, we can complete computation of effective addresses by the
end of the ID stage.
Delayed Branch Technique
→ If branch instruction has a penalty of n stall cycles, the execution cycle of a branches instruction:
Branch Instruction
→ Task of the compiler is to try filling up these delay slots to make more effective use.
→ Instructions in branch delay slots are always executed irrespective of whether the branch is taken or not.
COMPUTER ORGANIZATION | Instruction Pipeline PAGE 14