1
CS-506 Advanced Computer Systems
Architecture
Lecture#9- ILP
Gul Munir Ujjan
Assistant Professor
CISE Department, NEDUET
Karachi
MIPS Pipelined Architecture 2
(Microprocessor without Interlocked Pipelined Stages)
3
MIPS Pipelined Architecture
Instruction Fetch (IF) Stage
Instruction Fetch
Instruction’s address in PC is applied
to instruction memory that causes
the addressed instruction to become
available at the output lines of
instruction memory.
Updating PC
what is written in PC is determined
by the control signal PCSrc.
Depending upon the status of control
signal PCSrc, PC is either written by
the branch target address (BTA) or
the sequential address (PC + 4).
4
MIPS Pipelined Architecture
Instruction Decode (ID)
Stage
Instruction is decoded by
the control unit that takes 6-
bit opcode and generates
control signals.
The control signals are buffered
in the pipeline registers until they
are used in the concerned stage
by the corresponding instruction.
5
MIPS Pipelined Architecture
Instruction Decode (ID)
Stage
Registers are also read in this
stage.
➢ The first source register’s identifier in
every instruction is at bit positions
[25:21] and second source register’s
identifier (if any) is at bit positions
[20:16].
➢ The destination register’s identifier is
either at bit positions [15:11] (for R-
type) or at [20:16] (for load and
immediate data).
6
MIPS Pipelined Architecture
Execution (EX) Stage
This stage is marked by the
use of ALU that performs the
desired operation on registers
(R-type), calculates address
(memory reference
instructions), or compares
registers (branch).
7
MIPS Pipelined Architecture
Execution (EX) Stage
An ALU control accepts 6-bit
funct field and 2-bit control
signal ALUOp to generate the
required control signal for the
ALU.
Branch Target Address (BTA)
is also calculated in the EX
stage by a separate adder
8
MIPS Pipelined Architecture
Memory (M) Stage
Data memory is read (load)
or written (store) using the
address calculated by the
ALU in EX stage.
Branch decisions are taken
in this stage
➢ ZERO output of ALU and
BRANCH signal generated by
the control unit are ANDed to
determine the fate of branch
(taken or not taken)
9
MIPS Pipelined Architecture
Write Back (WB) Stage
Result produced by ALU
in EX stage (R-type) or
data read from data
memory in M stage (lw)
is written in destination
register.
The data to be written in …
destination register is selected
via multiplexer controlled by
the control signal MemToReg
10
Pipeline Hazards
11
MIPS Pipelined Architecture
Consider pipelined execution of following MIPS instructions:
ld R1, 10(R2)
dadd R3, R4, R5
The load instruction uses all stages in the pipeline but add instruction doesn’t
access data memory.
C1 C2 C3 C4 C5
ld IF ID EX M WB
dadd IF ID EX WB
A resource conflict is indicated in C5. That is, two different instructions
attempt to use the same hardware in the same cycle.
This can be averted by ensuring uniformity: make all instructions pass
through all the stages in the same order.
As a consequence, some instructions will do nothing (accomplished through
disabling corresponding control signals) in some stages
12
Pipeline Hazards
A pipeline hazard is a situation that prevents the next
instruction in the instruction stream from using a pipeline
stage during the designated clock cycle.
Hazards reduce the performance from the ideal speedup
gained by pipelining. There are three classes of hazards:
1. Structural hazards arise from resource conflicts when the
hardware cannot support all possible combinations of instructions
simultaneously in overlapped execution.
2. Data hazards arise when an instruction depends on the results of a
previous instruction in a way that is exposed by the overlapping of
instructions in the pipeline.
3. Control/ Branch hazards arise from the pipelining of branches
and other instructions that change the PC.
C1 C2 C3 C4 C5
13
Structural Hazards ld IF ID EX M WB
dadd IF ID EX WB
Some pipelined processors have shared a single-memory pipeline for data and
instructions. As a result, when an instruction contains a data memory
reference, it will conflict with the instruction reference for a later instruction.
14
Structural Hazards
Solution: Stall the pipeline for 1 clock cycle when the data memory access
occurs. A stall is commonly called pipeline bubble, since it floats through the
pipeline taking space but carry no useful work
No instruction is initiated on clock cycle 4 (which normally would initiate
instruction i+3). Because the instruction being fetched is stalled, all other
instructions in the pipeline before the stalled instruction can proceed normally.
In the above figure it is assumed that instructions i+1 and i+2 are not memory
references
15
Structural Hazards
Structural hazards are typically averted by employing
replicated resources.
This structural hazard can be avoided by having
Harvard Architecture i.e. separate memory units for
instructions and data
One for instruction fetch and another for
data read/write.
16
Data hazard
Example:
ADD R1 R2+R3
SUB R4 R1-R5
AND R6 R1 AND R7
OR R8 R1 OR R9
XOR R10 R1 XOR R11
17
Data hazard
18
Data hazard
There are a number of data dependencies
between various pair of instructions as
detailed below:
The DADD instruction writes the
value of R1 in the WB stage, but
the DSUB instruction reads the
value during its ID stage
The AND instruction is also
affected by this hazard. The write
of R1 does not complete until the
end of clock cycle 5. Thus, the
AND instruction that reads the
registers during clock cycle 4 will
receive the wrong results.
The OR instruction operates without incurring a hazard because we
perform the register file reads in the second half of the cycle and the writes
in the first half.
The XOR instruction also operates properly because its register read occurs
in clock cycle 6, after the register write.
19
Data Hazards
One way to solve this problem is by using a simple
hardware technique called forwarding (also called
bypassing).
Forwarding can be generalized to include passing a
result directly to the functional unit that requires it.
In the previous example the result is not really needed
by the DSUB instruction until after the DADD
instruction actually produces it.
If the result can be moved from the pipeline register
where the DADD stores it to where the DSUB needs it,
then the need for a stall can be avoided.
20
Data Hazards
21
Data Hazards
Where to find the ALU result?
The ALU result generated in the EX stage is passed
through the pipeline registers to the MEM and WB
stages, before it is finally written to the register file.
Since the pipeline registers already contain the ALU
result, we could just forward that value to subsequent
instructions, to prevent data hazards.