0 ratings 0% found this document useful (0 votes) 13 views 18 pages Coa Unit 5
A multiprocessor system consists of two or more CPUs that share a common memory, enabling parallel task execution and improved performance. Advantages include increased reliability, throughput, scalability, and efficient resource sharing, while disadvantages involve high costs and complexity. Various interconnection structures exist, such as shared memory and crossbar switch, and interprocessor synchronization is crucial for maintaining data integrity and preventing conflicts.
AI-enhanced title and description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save Coa unit 5 For Later COA UI
: 5 (MULTIPROCESSOR)
Definition of Multiprocessor:
‘A multiprocessor is a computer system having two or more processing units (CPUs) that are
connected and share a common main memory. These processors operate under a single
operating system and work together to execute different tasks in parallel, thereby increasing the
speed of computation and overall system performance. The principle characteristic of a
multiprocessor is its ability to share a set of main memory and some I/O devices.
Advantages of Multiprocessor Systems
Increased Reliabi
If one processor fails, the system can still function using the remaining processors, enhancing
fault tolerance and system dependability
Improved Throughput
Multiple processors execute tasks in parallel, leading to higher overall system throughput and
faster execution of jobs.
Parallel Processing
Tasks can be divided into sub-tasks and processed concurrently, which accelerates processing
and improves performance for large and complex applications.
Scalability
The system's processing power can be increased by adding more processors, making it
adaptable to higher workloads without drastic changes to the architecture.
Efficient Resource Sharing
Peripheral devices and data storage can be shared among processors, allowing for better
hardware utilization and resource management.
Better Multitasking
‘A multiprocessor system can effectively handle multiple jobs or user requests simultaneously,
which is particularly important in server or database environments
Disadvantages
1. High cost
2. Synchronization is problematic
3. Power consumption is high
4, High complexity
Types of multiprocessor
1. Shared memory(Tightly coupled): All processors share a global physical memory space
and run under a single operating system
2. Distributed memory(loosely coupled): Each processor has its own local memory and
communicates via message passing,
3. Symmetric Multiprocessing (SMP): All processors run the same OS and have equal
access to memory and //O.4. Asymmetric Multiprocessing (AMP): One master processor controls the system; others
perform specific tasks
Characteristics of Multiprocessor Systems:
Multiple Processors: Consist of two or more CPUs within a single computer system,
Shared Main Memory: All processors have access to a common, shared memory space,
enabling fast data communication,
Parallel Processing Capability: Multiple tasks can be performed simultaneously, improving
system throughput
High Reliability: Failure of one processor does not halt the system; remaining processors
continue to function.
Resource Sharing: Peripherals, memory, and other hardware resources are shared among all
processors.
Increased Throughput: Overall system performance and processing speed are significantly
increased.
Scalabi
/ Additional processors can be added to further enhance performance.
Structure of Multiprocessor: Five Types of Interconnection Structures
+ Amultiprocessor structure refers to how multiple CPUs (processors) are connected to
memory, I/O devices, and each other within a system. The interconnection structure
directly influences system performance, scalability, and complexity. Here are the five
main types
1. Common (Time-Shared) Bus Structure
All processors, memory modules, and |/O devices share a single communication bus.
Only one processor can use the bus at any given time, When a processor wants to communicate
with memory or another processor, it must check if the bus is free. If the bus is in use, it has to
wait for the bus to become available.
Use Cases: Small multiprocessor systems or where cost is more important than scalability.
Advantage:
+ Simple to implement.
+ Due to single common bus cost to implement is very less.
Disadvantage:
+ Data transfer rate is slow.2. Crossbar Switch Structure
Every processor is connected to every memory module through a matrix of switches.
Multiple simultaneous connections are possible, improving performance.
Consists of a matrix of switches at the intersection of processor buses and memory module
paths. Each processor can be directly connected to each memory module via a crosspoint
switch
Advantages: High throughput—supports multiple simultaneous data transfers, maximizing
bandwidth.
Disadvantages: Hardware complexity and cost increase rapidly as the number of processors
and memory modules grows (nxn matrix for n processors and n memory modules). Difficult and
expensive to implement for large systems.
Use Cases: High-performance systems where simultaneous memory access is crucial.
3, Multiport Memory Structure
Each memory module has multiple ports (connections), one for each processor.
Allows parallel access to different memory modules by different processors.
Each memory module is equipped with multiple ports, allowing several processors to connect
directly to each memory module via dedicated buses.
Priority logic resolves conflicts if more than one processor requests the same memory module.
If more than one CPU request for same time memory module, priority will be given in the order of CPU-
1,CPU-2,CPU-3,CPU4
Use Cases: Used in systems requiring very high-speed memory access by a limited number of
processors
Advantages- The high transfer rate can be achieved because of the multiple paths.
Disadvantages- It requires expensive memory control logic and a large number of cables and
connections4, Multistage Switching Network
The 2x2 crossbar switch is used in the multistage network. It has 2 inputs (A & B) and 2 outputs
(0 & 1). To establish the connection between the input & output terminals, the control inputs CA
& CB are associated
Examples: Omega, Butterfly, and similar multistage networks.
Reduces hardware complexity compared to crossbar but may introduce blocking for some
access patterns.
Advantages: More scalable and cost-effective than full crossbar; allows for multiple,
simultaneous connections.
Disadvantages: Some memory access patterns may cause blocking (if two processors try to
reach the same memory through a shared switch path), and the network is still more complex
than a single bus system
5. Hypercube Interconnection Structure+ Processors are arranged as nodes of an n-dimensional cube (for n, there are 2” nodes).
+ Each processor is directly connected to n others (its "neighbors"). Communications are
established along cube edges.
‘+ Efficient communication paths and good for loosely coupled systems
+ Example: For n=3, each processor has 3 neighbors and there are 8 processors in total
+ Advantages: Efficient for communication-intensive applications
+ Disadvantages: Connection complexity grows with the number of dimensions (higher
cabling/interface logic)
Figure - Hypercube Structures Forn = 12:3
Interprocessor arbitration
Interprocessor arbitration is the mechanism used in multiprocessor systems to control and
coordinate access to a shared resource, usually the system bus, among multiple processors. When
several processors request simultaneous access to the common bus (for memory, /O, or
communication), arbitration decides which processor will be granted access at a given time to avoid
conflicts and ensure orderly data transfer.
+ Purpose: To resolve conflicts and enforce that only one processor accesses the shared bus at a
time, preventing data corruption and system instability
+ Arbitration methods:
Static and Dynamic:
Static- Serial and Parallel
Serial (Daisy Chain) Arbitration: Processors are connected in series, and priority is
assigned based on their position in the chain. The highest priority processor that requests,
gets the bus access first. I's simple but can cause higher latency for lower-priority
processors.
Hoes
ony
+ Advantages
Simple and cheaper method
Least number of lines,
+ Disadvantages
Higher delay
Priority of the processor is fixed \Not reliable.2. Parallel Arbitration: Uses a priority encoder and decoder circuit external to the processors
to determine the highest priority bus request simultaneously. This method is faster and more
flexible than serial arbitration
+ Advantage Separate pair of bus request and bus grant signals, so it is faster
+ Disadvantage Require more bus request and grant signal
+ Dynamic Arbitration: Priorities can change dynamically based on algorithms .
«Few dynamic arbitration procedures that use dynamic priority algorithms: Time Slice, Polling, LRU,
FIFO
1. Time Slice Algorithm
Purpose: Allocates a fixed amount of time (called a *
or processor in a round-robin manner.
How it works: Each requestor gets the bus or CPU for a set period. If it doesn’t finish in that
time, i's interrupted, and the next requestor uses the resource.
2, Polling Algorithm
ime slice" or “quantum*) to each process
Purpose: Used to check the status of multiple devices or processors in a sequence to see if they
need the bus or attention.
How it works: The controller "polls" each processor/device one by one in a fixed order and
grants access only if that processor has made a request.
3. LRU (Least Recently Used) Algorithm
Purpose: Manages which resource (like cache blocks or bus access slots) should be replaced or
given priority, based on recent usage.
How it works: The processor or cache block that has not been used for the longest period is
selected for replacement or is given the lowest priority
4, FIFO (First-In, First-Out) Algorithm
Purpose: Schedules resource allocation in the order in which requests arrive,
How it works: The earliest requestor (first in) is served first, and new requests are queued at the
end (first out).+ Advantages
1, The priority can be changed by altering the sequence stored in controller.
2. More reliable
Inter Process Communication
Processes need to communicate with each other in many situations. Inter-Process Communication or IPC is a
‘mechanism that allows processes to communicate, It helps processes synchronize their activities, share
information, and avoid conflicts while accessing shared resources.
Types of Process
+ Independent process: An independent process is not affected by the execution of other pro
Independent processes do not share any data or resources with other processes. No inter-pro
communication is required in this case
+ Co-operating process: Interact with each other and share data or resources. A co-operating process can be
affected by other executing processes. Inter-process communication (IPC) is a mechanism that allows
processes to communicate with each other and synchronize their actions. The communication between these
processes can be seen as a method of cooperation between them,
Inter process communication (IPC) allows different processes running on a computer to share information
with each other. IPC allows processes to communicate by using different techniques like sharing memory,
sending messages or using files. t ensures that processes ean work together without interfering with each
other. Co-operating processes require an Inter Process Communication (IPC) mechanism that will allow them
to exchange data and information,
The two fundamental models of Inter Process Communication are:
+ Shared Memory * Message Passing
feo So
ao era)See
ses to communicate and to synchronize
Sed
Coen) r
Interprocessor Synchronization :
Interprocessor synchronization in Computer Organization and Architecture (COA) is the set of
techniques and mechanisms that ensure multiple processors or cores in a multiprocessor system
coordinate their actions when accessing shared resources or performing parallel tasks.
Why is Interprocessor Synchronization Needed?
Prevent Conflicts: When two or more processors attempt to access or modify the same data in
shared memory concurrently, conflicts (race conditions, data corruption) can occur.
Maintain Data Integrity: It ensures only one processor accesses the shared resource/critical
section at a time, preventing inconsistent or incorrect results
Orderly Execution: Synchronization coordinates the execution order among processors, so
programs run correctly and efficiently.+ Avoid Deadlocks: It helps avoid deadlocks and priority inversion by properly managing
resource allocation among processors.
‘Methods & Mechanisms
1, Mutual Exclusion (Mutex):
‘+ Only one processor can enter the critical section (a section of code accessing shared resources)
ata time,
+ Implemented via mutexes, semaphores, locks, or hardware instructions
Semaphores:
+ Variables (binary or counting) used to signal and control access to resources.
+ Use operations such as WAIT (P) and SIGNAL (V) to manage process access.
3. Message Passing:
In distributed systems, processors synchronize by passing messages through established
communication channels.
+ Ensures messages are received and processed in the correct order, and resources are not
accessed until safe.
Problems Without Synchronization
+ Race Conditions: Unpredictable outcomes from simultaneous memory access.
+ Data Inconsistency/Loss: Multiple writes to the same data without coordination can corrupt
results.
+ Deadlocks: Two or more processors block each other by waiting indefinitely for resources
Cache coherence
Cache coherence refers to maintaining consistency of data stored in local caches of processors in a
multiprocessor system when they share a common memory space. When multiple processors have
their own cache and keep copies of shared data from main memory, updating or modifying that
data in one cache must be reflected in other caches to prevent incorrect or inconsistent results.
‘Why is Cache Coherence Needed?
‘+ Processors in a multiprocessor system frequently access shared data
+ If one processor updates its cached copy of a shared variable, others might still have outdated
values.
+ Without coherence mechanisms, programs can read stale or incorrect data
Shared memory multiprocessors :
1. UMA (Uniform Memory Access)Description:
UMA stands for Uniform Memory Access. In this architecture, all processors share the main
memory uniformly, meaning every processor has equal access time and speed to any memory
location. Memory access time is independent of which processor is accessing which memory
module—there is no “local” or "remote" memory.
UNIFORM MEMORY ACCESS (UMA)
Processor 1 Processor 2 |- + + + | Processor n
l I I
‘System, Interconnect (Bus, Crossbar, Multistage network)
Ve ‘Shared foe e Shared
Memory 1 Memory m
2. NUMA (Non-Uniform Memory Access)
Description:
NUMA stands for Non-Uniform Memory Access. Here, the system is divided into several nodes.
Each node contains processors and its own local memory. A processor can access its own local
memory much faster than the memory located in another node (remote memory). Thus,
memory access time depends on the physical location of the data relative to the processor.
NON-UNIFORM MEMORY ACCESS (NUMA),
el rocessor
Memory 1 & 4
Tocal
Memory 2 Grsseeers 2 Inter-
. connection| «
5 Network | ¢
Tocat
oe Processor n
3. COMA (Cache-Only Memory Architecture)+ Description:
COMA stands for Cache-Only Memory Architecture. In COMA systems, the local memories of
each node are treated as large caches, not as main memory. There is no home node for any
data—memory lines automatically migrate to wherever they are needed most, making the
entire memory space function as a giant cache
CASH ONLY MEMORY ACCESS (COMA)
Interconnection Network
Concept of Pipel
Pipelining is a technique in a computer's CPU where the process of executing instructions is
divided into small steps, and these steps happen at the same time for different instructions. This
helps the CPU work faster by doing many instructions at once in a sequence, just like an assembly
line in a factory.
+ In pipelining, each instruction is broken down into sub-operations, such as fetching, decoding,
executing, and storing. Each sub-operation is performed in a dedicated segment, and as soon as
one stage completes its part for an instruction, it passes it to the next stage and starts on the
next instruction,
‘+ This overlapping of instruction execution increases the CPU's throughput, meaning more
instructions are completed in a given period, similar to how an assembly line increases the
number of finished products.
+ Pipelining is fundamental for modern CPUs and comes in types like instruction
pipelining (handling the stages of fetching, decoding, executing, etc) and arithmetic
pipelining (dividing arithmetic operations into pipeline stages)
Imagine a factory where a product is made in steps, like assembling a toy. Instead of one person
making the whole toy from start to finish, the work is divided into different steps, and each person
specializes in one step. While one person is working on painting the toy, another person is already
putting together the next toy’s parts. This way, many toys are being worked on at the same time,
just at different stages.
Types of Pipelining1. Instruction Pipelining
+ Definition: The execution of multiple instructions is divided into stages (like fetch, decode,
execute, memory access, write back), so several instructions can be processed at different stages
simultaneously.
Use: Improves the throughput of instruction processing in CPUs and is the most widely
implemented pipelining in processors.
Example: While one instruction is being executed, the next one is decoded, and another is
fetched from memory.
2. Arithmetic Pipelining
Definition: Arithmetic operations (such as floating point addition, multiplication, division) are
separated into smaller steps, each performed in a pipeline stage, allowing multiple arithmetic
computations to overlap.
Use: Used in mathematical operation units for high-speed computation (eg,, floating-point
operations in scientific calculators or processors)
Example: Breaking down floating-point addition into compare exponents, align mantissas,
add/subtract mantissas, and normalize result
Advantages :
‘+ Increases the number of instructions completed in less time (better throughput)Keeps all CPU parts busy at once, avoiding idle time.
Allows faster clock cycles by breaking tasks into small steps.
Makes the CPU run faster without changing single instruction speed.
Easy to scale and improve performance.
Pipeline hazards are conditions in pipelined processors that prevent the next instruction from
executing in its scheduled clock cycle, causing delays or stalls in the instruction pipeline.
There are three main types of pipeline hazards relevant for Computer Organization and
Architecture:
Structural Hazards
Occur when two or more instructions in the pipeline require the same hardware resource at the
same time (eg, a single memory or ALU unit needed simultaneously). This leads to resource
conflicts and forces the pipeline to stall or wait.
Example: One instruction is fetching an instruction from memory while another needs to read or
write data from the same memory.
Data Hazards
Happen when an instruction depends on the result of a previous instruction that has not yet
completed its execution, These dependencies cause stalls until the required data is available.
Data hazards have three subtypes:
RAW (Read After Write): An instruction needs to read a location that a previous instruction is
writing to.
After Read): An instruction writes to a location after a previous instruction has
WAW (Write After Write): Multiple instructions write to the same location out of order.
Control Hazards (Branch Hazards)
Result from instructions that alter the program flow (branches, jumps). The pipeline may fetch
wrong instructions while waiting to resolve the branch decision, causing delays or flushing of
incorrect instructions.
Vector processing :
Definition: Vector processing is the technique of executing operations on entire arrays (vectors) of data
ina single instruction cycle.
* Vector Register:
‘+ Special registers called veetor registers are used to store arrays of data,
+ Each vector register can hold multiple elementsVECTOR PROCESSING
+ There is a class of computational problems that are beyond the
capabilities of a conventional computer. These problems require a vast
number of computations that will take a conventional computer days or
even weeks to complete.
Vector Processing Applications
+ Problems that can be efficiently formulated in terms of vectors and
matrices
— Long-range weather forecasting - Petroleum explorations
— Seismic data analysis -Medical diagnosis
— Aerodynamics and space flight simulations
— Artificial intelligence and expert systems
— Mapping the human genome
— Image processing
Vector Processor (computer)
+ Ability to process vectors, and matrices much faster than conventional
computers
In vector processor a single instruction, can ask for multiple data operations, which saves time,
instructions is decoded once, and then it keep on operating on different data items.
Vector Processing:
What is Array Processing?
Array Processing is a technique in computer architecture where a single instruction operates on
multiple data elements arranged in arrays (like matrices or vectors) simultaneously. It is a form of
parallel processing used to improve performance in data-heavy applications such as scientific
‘computing, image processing, and machine learning,
It is based on the SIMD (Single Instruction, Multiple Data) model, where the same operation is applied
to multiple data points in parallel.2. Attached Array Processing
Definition:
In Attached Array Processing, the array processor is used as @ coprocessor or secondary processor,
connected to a general-purpose host computer. The main processor handles general tasks, while the
attached array processor handles intensive numeric computations.
Key Characteristics:
Array processor acts as a dedicated computation unit.
Connected to the main CPU via a bus or channel.
Processes floating-point operations or matrix-heavy computations,
Often used in scientific computing, engineering simulations
1. SIMD (Single Instruction, Multiple Data) Array Processing
Definition:
In SIMD array processing, a single instruction is broadcasted to all processing elements (PEs), and
cach PE performs the same operation on different data elements simultancously.
Key Characteristics:
‘+ One Control Unit governs all PEs.
+ AIIPEs execute same instruction at the same time,
‘+ Suitable for tasks like image processing, matrix operations, and veetor calculations.
‘+ Fast and efficient for regular data patternsRISC (Reduced Instruction Set Computer)
Definition:
RISC processors use a small, highly-optimized set of simple instructions. Each instruction
performs a single task and typically executes in one clock cycle.
Key Features:
Simpler instructions and decoding.
Fixed-length instructions (usually one word).
Large number of general-purpose registers.
Loads and stores are separate instructions.
Simple addressing modes.
Highly pipelined for faster performance.
Lower power consumption.
Advantages:
Fast instruction execution.
Easier to design and optimize (hardware and compiler).
int for portable devices and high-performance systems.
\dvantages:
More instructions required for complex operations, possibly resulting in bigger code size.
May need more RAM to hold additional instructions.
Examples: ARM, MIPS, SPARC processors.
CISC (Complex Instruction Set Computer)
Definition:
CISC processors have a large and complex instruction set—each instruction can perform
multiple low-level operations (like loading from memory, arithmetic, storing).
Key Featur© Complex and variable-length instructions.
‘© Fewer general-purpose registers (more operations in memory itself)
= Complex addressing modes.
‘+ Single instruction may perform multiple tasks (e.g., loading, adding, storing all at once).
«Instructions can take several clock cycles to execute.
‘+ Microprogrammed control logic is commonly used.
+ Advantages:
+ Fewer instructions needed for each task (compact code).
+ Makes efficient use of memory.
+ Established software ecosystem (widely used in desktop PCs, lke Intel x86).
+ Disadvantages:
‘+ Slower execution per instruction due to complexity.
+ More complicated hardware design and decoding.
+ Higher power consumption
«Examples: Intel x86, AMD processors.
1) CISC architecture 1) RISC architecture gives
more importance to hardware more importance to Software
2) Complex instructions. 2) Reduced instructions,
3) It access memory directly 3) It requires registers.
4) Coding in CISC processor is | 4) Coding in RISC processor
simple. requires more number of lines.
5) As it consists of complex 5) It consists of simple
instructions, it take multiple instructions that take single
cycles to execute eyele to execute
6) Complexity lies in 6) Complexity lies in
microporgram compiler.
‘Whaat is a Multicore Processor?
A multicore processor is a single integrated circuit (IC) chip that contains two or more
independent processing units called cores. Each core is capable of reading and executing
program instructions on its own, allowing the processor to handle multiple tasks simultaneously.
For example, a dual-core processor has two cores, a quad-core has four, and so on.
Characteristies of Multicore Processors
+ Multiple Cores on One Chip: Each core is a fully functional processing unit with its
own registers, arithmetic logic unit (ALU), and cache.
+ Parallel Processing: Cores can execute different instructions at the same time, enabling
‘rue parallelism and improved performance.Shared/Separate Cache: Cores may have their own (L1) cache memories and also share larger
caches (L2, L3) to facilitate faster data access and inter-core communication
Efficiency: By sharing the same chip and components, multicore processors improve
energy and space efficiency over using multiple single-core chips.
Compatibility: Widely used in desktops, laptops, smartphones, servers, and embedded systems.
Advantages of Multicore Processors
Better Performance: Can perform more operations in parallel, especially with
software optimized for multiple threads.
Improved Multitasking: Several applications or processes can run simultaneously without
slowing down the system
Energy Efficiency: Multicore chips consume less power and generate less heat than a system
with multiple separate processors.
Reliability: If one core fails during an operation, other cores can continue functioning,
adding to system robustness.
Efficient Resource Sharin
and improve speed.
Scalability: Future versions with more cores can be introduced for higher performance without
major architectural changes.
Shorter communication paths and shared caches reduce latency