Computer Architecture
(PCC CS-402)
Vector Processor
May 12, 2025
Introduction
■ A processor can operate on an entire vector in one
instruction.
■ Work done automatically in parallel
(simultaneously).
■ The operand to the instructions are complete vectors
instead of one element.
■ Vector instructions access memory with known
pattern.
■ Reduces branches and branch problems in pipelines.
May 12, 2025 2
Introduction
■ Vector processor is an ensemble of hardware resources,
including vector registers, functional pipelines,
processing elements and register counters for
performing vector operations.
■ It is a coprocessor specially designed for vector
computation.
■ Vector instruction involves a large array of operands.
■ Are often used in multi-pipelined supercomputer.
■ Two different architectures are available:
● Register-to-register architecture (Ex.: Cray
supercomputer)
Uses shorter instruction and vector register files.
● Memory-to-memory architecture (Ex.: Cyber 205)
Uses memory based instructions which are longer in length
including memory address.
■ Consists with fixed number of vector registers.
May 12, 2025 3
Register based Vector instruction
■ Typical register-based vector operations listed below
where vector operator is represented by ϋ, a scalar
register as Si, vector register of length n as Vi, memory
array of length n as M(1 : n):
● V1 ϋ V2 → V3 (binary vector)
● S1 ϋ V1 → V2 (scaling)
● V1 ϋ V2 → S1 (binary production)
● M(1 : n) → V1 (vector load)
● V1 → M(1 : n) (vector store)
● V1 → V2 (unary vector)
● V1 → S1 (unary production)
■ Vector length should be equal in all operands used in
vector instruction.
May 12, 2025 4
Memory based Vector instruction
■ Typical memory-based vector operations listed below
where vector operator is represented by ϋ, a scalar
register as Si, memory array of length n as M(1:n),
scalar quantity stored in memory location k is
represented by M(k):
● M1(1 : n) ϋ M2(1 : n) → M(1 : n)
● S1 ϋ M1(1 : n) → M2(1 : n)
● M1(1 : n) → M2(1 : n)
● M1(1 : n) ϋ M2(1 : n) → M(k)
■ Vector length is not restricted by register length.
May 12, 2025 5
Vector instruction types
■ Define vector instruction types by mathematical
mappings between their working registers or memory
where vector operands are stored
● Vector-vector instructions
One or two vector operands are fetched from the
respective vector registers.
Enter through a functional pipeline unit, and produce
results in another vector register
f1 : V i → Vj
f2 : V j × Vk → Vi
May 12, 2025 6
Vector instruction types
● Vector-scalar instructions
Each elements of Vk are multiplied by a scalar s to
produce vector Vi of equal length.
f3 : s × V k → Vi
May 12, 2025 7
Vector instruction types
● Vector-memory instructions
This corresponds to vector load or vector store element
by element, between the vector register (V) and the
memory (M) as defined below:
f4 : M → V (vector load)
f5 : V → M (vector store)
Vector Load instruction Vector Store instruction
May 12, 2025 8
Vector instruction types
● Vector reduction instructions
f6 include finding the maximum, minimum, sum, and
mean value of all elements in a vector.
f 6 : V i → Sj
f7 is the dot product which performs from two vectors A
= (ai) and B = (bi).
f 7 : V i × V j → Sk
May 12, 2025 9
Vector instruction types
● Masking instructions
This type of instruction uses a mask vector to
compress or to expand a vector to a shorter or longer
index vector respectively, corresponding to the
following mappings
f8 : V 0 × V m → V1
May 12, 2025 10
Vector instruction types
● Gather instructions
This instruction use two vector registers to gather
vector elements randomly throughout the memory.
f9 : M → V1 × V0 (Gather)
May 12, 2025 11
Vector instruction types
● Scatter instructions
This instruction use two vector registers to scatter
vector elements randomly throughout the memory.
f10 : V1 × V0 → M(Scatter)
May 12, 2025 12
Basic Vector Architecture
■ Pipeline architecture may have a number of steps.
■ There is no standard when it comes to pipelining
technique.
■ Cray-1 has 14 stages to perform vector operations.
■ Data is read into vector registers which are FIFO
queues.
■ Can hold 50-100 floating point values.
■ The instruction set:
● Loads a vector register from a location in memory.
● Performs operations on elements in vector registers.
● Stores data back into memory from the vector registers.
■ A vector processor is easy to program parallel SIMD
computer.
■ Memory references and computations are overlapped to
bring about a tenfold speed increase.
May 12, 2025 13
Basic Vector Architecture
Typical vector processor architecture.
May 12, 2025 14
Basic Vector Architecture
Closer view of vector processor register and functional unit.
May 12, 2025 15
Cray-1 Vector Architecture
Cray-1 vector computer architecture.
May 12, 2025 16
Advantages
■ Each result is independent of previous results -
allowing high clock rates.
■ A single vector instruction performs a great deal of
work - meaning less fetches and fewer branches (and
in turn fewer mis-predictions).
■ Vector instructions access memory a block at a time
which results in very low memory latency.
■ Less memory access = faster processing time.
■ Lower cost due to low number of operations
compared to scalar counterparts.
May 12, 2025 17
Disadvantages
■ Works well only with data that can be executed in
highly or completely parallel manner.
■ Needs large blocks of data to operate on to be efficient
because of the recent advances increasing speed of
accessing memory.
■ Severely lacking in performance compared to normal
processors on scalar data.
■ High price of individual chips due to limitations of
on-chip memory.
■ Increased code complexity needed to vectorize the
data.
■ High cost in design and low returns compared to
superscalar microprocessors.
May 12, 2025 18
Applications
■ Useful in applications that involve comparing or
processing large blocks of data.
■ Multimedia Processing (compress., graphics, audio
synthesis, image processing)
■ Speech and handwriting recognition.
■ Lossy Compression (JPEG, MPEG video and audio).
■ Lossless Compression (Zero removal, RLE,
Differencing, LZW).
■ Cryptography (RSA, DES/IDEA, SHA/MD5).
May 12, 2025 19
Conclusion
■ The Vector machine is faster at performing mathematical
operations on larger vectors.
■ The Vector processing computer’s vector register
architecture makes it better able to compute vast
amounts of data quickly.
■ While Vector Processing is not widely popular today, it
still represents a milestone in supercomputing
achievement.
■ Since scalar processors designed can also be used for
general applications their cost per unit is reduced
drastically. Such is not the case for vector
processors/supercomputers.
■ Vector processors will continue to have a future in Large
Scale computing and certain applications but can never
reach the popularity of Scalar microprocessors.
May 12, 2025 20