Pipelining and Parallel
Processing
What is Pipelining?
A way of speeding up execution of
instructions
Key idea:
overlap execution of multiple instructions
What is Pipelining
A technique used in advanced microprocessors
where the microprocessor begins executing a
second instruction before the first has been
completed.
- A Pipeline is a series of stages, where some work
is done at each stage. The work is not finished
until it has passed through all stages.
With pipelining, the computer architecture allows
the next instructions to be fetched while the
processor is performing arithmetic operations,
holding them in a buffer close to the processor
until each instruction operation can performed.
How Pipelines Works
The pipeline is divided into segments
and each segment can execute it
operation concurrently with the other
segments. Once a segment
completes an operations, it passes
the result to the next segment in the
pipeline and fetches the next
operations from the preceding
segment.
Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
A B C D
Washer takes 30 minutes
Dryer takes 30 minutes
“Folder” takes 30 minutes
“Stasher” takes 30 minutes
to put clothes into drawers
If we do laundry
6 PM 7
sequentially...
8 9 10 11 12 1 2 AM
3030 30 30 3030 30 30 3030 30 30 3030 30 30
T Time
a A
s
k
B
O
r C
d
e D
r
Time Required: 8 hours for 4 loads
To Pipeline, We Overlap
6 PM 7 8 9
Tasks
10 11 12 1 2 AM
3030 30 30 30 3030 Time
T
a A
s
k
B
O
r C
d
e D
r
Time Required: 3.5 Hours for 4 Loads
To Pipeline, We Overlap
6 PM 7 8 9
Tasks
10 11 12 1 2 AM
3030 30 30 30 3030 Time
T
a A
s
k
B • Pipelining doesn’t help latency of
O
r C single task, it helps throughput of
d entire workload
e D
r
Computer performance in terms of
Latency and throughput
•Latency—the amount of time that a
single operation takes to execute
•Throughput —the rate at which
operations get executed (generally
expressed as operations/second or
operations/cycle)
•Pipelining increases throughput, but
Parallel Processing
Execution of Concurrent Events in the computing
process to achieve faster Computational Speed
The purpose of parallel processing is to speed up the
computer processing capability and increase its
throughput.
The amount of hardware increases with parallel
processing, and with it, the cost of the system
increases.
However, technological developments have reduced
hardware costs to the point where parallel processing
techniques are economically feasible.
Basic Ideas
Parallel processing Pipelined processing
time time
P1 a1 a2 a3 a4 P1 a1 b1 c1 d1
P2 b1 b2 b3 b4 P2 a2 b2 c2 d2
P3 c1 c2 c3 c4 P3 a3 b3 c3 d3
P4 d1 d2 d3 d4 P4 a4 b4 c4 d4
Less inter-processor communication
More inter-processor communication
Complicated processor hardware Simpler processor hardware
Colors: different types of operations performed
a, b, c, d: different data streams processed
Data Dependence
Parallel processing Pipelined processing will
requires NO data involve inter-processor
dependence between communication
processors
P1 P1
P2 P2
P3 P3
P4 P4
time time
Pipelining a Processor
Recall the 5 steps in instruction
execution:
1. Fetch instruction (FI)
2. Decode instruction (DI)
3. Calculate operands (CO)
4. Fetch operands (FO)
5. Execute instructions (EI)
6. Write result (WR)
Review: Single-Cycle Processor
All 6 steps done in a single clock cycle
Dedicated hardware required for each step
Instructions Fetch
The instruction Fetch (IF) stage is responsible for
obtaining the requested instruction from memory.
The instruction address are stored in the register
as temporary storage.
Instruction Decode
The Instruction Decode (ID) stage is responsible
for decoding the instruction and sending out the
various control lines to the other parts of the
processor. The instruction is sent to the control
unit where it is decoded
Calculate Operands
• The CO stage is where any calculations are
performed. The main component in this stage
is the ALU. The ALU is made up of arithmetic,
logic and capabilities.
Fetch Operands and
Execute Instruction
The FO and EI stages are responsible
for storing and loading values to and
from memory. They also responsible
for input and output from the
processor respectively.
Write Operands
The WO stage is responsible for writing the result
of a calculation into main memory.
•
Timing diagram for instruction
pipeline operation
Advantages/Disadvantages
Advantages:
More efficient use of processor
Quicker time of execution of large number of
instructions
Disadvantages:
Pipelining involves adding hardware to the
chip
Inability to continuously run the pipeline
at full speed because of pipeline hazards
which disrupt the smooth execution of the
pipeline.
Comments about Pipelining
The good news
Multiple instructions are being processed at same time
This works because stages are isolated by registers
The bad news
Instructions interfere with each other - hazards
Example: different instructions may need the same
piece of hardware (e.g., memory) in same clock cycle
Example: instruction may require a result produced by
an earlier instruction that is not yet complete
Pipeline Hazards
Data Hazards – an instruction uses the result of
the previous instruction. A hazard occurs exactly
when an instruction tries to read a register in its
ID stage that an earlier instruction intends to
write in its WB stage.
Structural Hazards – two instructions need to
access the same resource.
two different instructions use same h/w in same
cycle
Data Hazards
Select R2 and R3 for ADD R2 and R3 STORE SUM IN
ALU Operations R1
ADD R1, R2, R3 IF ID EX M WB
SUB R4, R1, R5 IF ID EX M WB
Select R1 and R5 for
ALU Operations
Stalling
Stalling involves halting the flow of instructions
until the required result is ready to be used.
However stalling wastes processor time by
doing nothing while waiting for the result.
ADD R1, R2, R3 IF ID EX M WB
STALL IF ID EX M WB
STALL IF ID EX M WB
STALL IF ID EX M WB
SUB R4, R1, R5 IF ID EX M WB
Summary - Pipelining
Overview
Pipelining increase throughput (but
not latency)
Hazards limit performance
Structural hazards
Data hazards