Topics to be
covered
• Flynn's
taxonomy,
• Parallel
Processing,
• Pipelining,
• Arithmetic
Pipeline,
• Instruction
Pipeline,
• RISC Pipeline, 1
Parallel Processing
A parallel processing system is able to perform
concurrent data processing to achieve faster
execution time
•The system may have two or more ALUs and be
able to
execute two or more instructions at the same time
•Goal is to increase the throughput – the amount of
processing that can be accomplished during a given
interval
of time
•Parallel processing is established by distributing data
among multiple functional units. Fig 1 shows the
separation of execution unit into eight functional 2
Parallel
Processing
3
Flynn’s
There are variety of ways parallel processing can be classified.
Taxonomy
Parallel processing occuring in instruction stream/data stream or
both.
Flynn’s classification divides computer into four major groups as
follows:
Single instruction stream, single data stream – SISD
Single instruction stream, multiple data stream – SIMD
Multiple instruction stream, single data stream – MISD
Multiple instruction stream, multiple data stream – MIMD
5
5
Pipelinin
Pipelining is a technique of decomposing a
sequential processginto suboperations, with each sub
process being executed in a special dedicated
segment that operates concurrently with all other
segments.
of
Pipeline can be visualized as collection processing
segments. And binary info flows through each of
them
Pipeline implies flow of info similar to industry
assembly line.
6
Pipelining
for example we
example
want to perform
combined multiply
and add
operations with
stream of numbers
Ai*Bi + Ci for
i=1,2,3…7
The sub operations
performed in each
segment of the
pipeline are as
follows:
R1
R3 Ai,R1 * R2 R4Bi
Ci
R5 R3 + R4 Shweta 7
Joshi
Pipelining
Content of registers of pipelining
example
example
•The five
registers are
loaded with new
data every clock
pulse.
•The effect of
each clock is
shown in table as
shown
•The clock must
continue until
last output
emerges out of
the pipeline.
Pipelinin
g
• The figure above shows the general structure four
of
segment pipeline.
• Each segment consists of combinational ckt Si that performs a
sub
operation over data stream through pipe.
•The behavior of pipeline can be understood by the space time
diagram below.
1 2 3 4 5 6 7 8 9
T1 T2 T3 T4 T5 T6 •
T1 T2 T3 T4 T5 T6 Refer the space time diagram
for 6-tasks and 4 segments
T1 T2 T3 T4 T5 T6
T1 T2 T3 T4 T5 T6
Pipelinin
gConsider a case : k-segment pipeline with a clock time t to
execute n tasks
p
First task T1requires time equal to Ktp
After that the remaining (n - 1) results will come out at each
clock cycle.
It therefore takes k + (n - 1) clock cycles to complete n tasks
using k-segment pipeline
If a non pipeline unit performs the same operation and takes time
equal to tn to execute each task ; then total time required for n
tasks isntn
The speedup gained by using the pipeline is:
S= ntn / (k + n - 1)tp
Pipelinin
Instruction
g
Pipeline
•Fetch the instruction
from memory
•Decode the
instruction
•Calculate the effective
address
•Fetch the operands
from memory
•Execute the
instruction
•Store the result in the
proper place
Pipelinin
Instruction Pipeline
g
Consider the timing of
instruction pipeline as
shown adjoining
FI : Instruction Fetch
DA : Decode Instruction
&calculate EA
FO : Operand Fetch
EX : Execution
Pipeline Conflicts : 3 major difficulties
•Resource conflicts : memory access by two segments at the same time
•Data dependency :when an instruction depend on the result of a previous
instruction, but this result is not yet available
•Branch difficulties :branch and other instruction (interrupt, ret, ..) that change the
value of PC
Pipelinin
RISC Pipeline
g
RISC CPU has Instruction Pipeline ;Single-cycle instruction execution
Compiler support
Here for an example 3 Suboperations Instruction Cycle is considered ( show
in fig) where I : Instruction fetch A : Instruction decoded and ALU
operation E : Transfer the output of ALU to a register, memory, or PC
RISC
Pipeline •There are several techniques
Pipelining for reducing branch penalties
one of the method is ofdelayed
branch
•The above fig shows
using no operation
instruction ;
•The below fig shows
rearranging the
instructions