Unit 5 Part 2
Unit 5 Part 2
I/O equipment to enhance performance, reliability, and scalability . They enable parallel
processing, allowing for faster execution of tasks and better multitasking
capabilities. Multiprocessors can be categorized based on memory organization (shared vs.
distributed) and processor interaction (symmetric vs. asymmetric).
1. Shared Resources:
System Bus:
The primary communication pathway connecting processors, memory, and I/O devices.
Memory:
Access to shared memory locations needs to be arbitrated to prevent data corruption.
Other Peripherals:
Access to shared peripherals (e.g., network interfaces, storage devices) also requires arbitration.
2. Arbitration Mechanisms:
Serial (Daisy Chain) Arbitration:
Requests propagate sequentially through a chain of arbitration circuits. The highest priority device gets the bus first. If it doesn't need
it, the request passes to the next in line.
Parallel Arbitration:
Multiple requests are handled simultaneously using an external priority encoder and decoder. This allows for faster arbitration
compared to the daisy chain method.
Dynamic Arbitration:
Schemes like round-robin or prioritizing the longest waiting request are used to dynamically adjust priorities and ensure fairness.
3. Key Concepts:
Bus Request (BRQ):
A signal from a processor indicating it wants to access the shared resource.
Busy Signal:
Indicates that the bus is currently in use.
Priority:
Determines which processor gets access when multiple requests occur simultaneously.
Optimizing Resource Utilization: Allows for efficient sharing of resources among multiple processors.
Purpose:
It allows different parts of a system, whether on the same machine or across a network, to cooperate
and share resources.
Methods:
IPC mechanisms include shared memory, message passing (e.g., pipes, message queues), sockets,
and remote procedure calls (RPC).
Example:
In a web server, different processes might handle user requests, database interactions, and file
storage. IPC allows them to communicate and coordinate their tasks.
Synchronization:
Definition:
Synchronization ensures that multiple processes access shared resources in a controlled way,
preventing data corruption or inconsistencies.
Purpose:
It prevents race conditions (where multiple processes try to modify the same data simultaneously) and
ensures data integrity.
Methods:
Synchronization techniques include semaphores, mutexes, monitors, and barriers.
Example:
If multiple processes are writing to the same file, synchronization mechanisms ensure that only one
process writes at a time, preventing garbled or incomplete data.
In essence, IPC is about what is being communicated, and synchronization is about how it is
communicated.
Inter-Processor Communication (IPC) refers to the mechanisms that allow multiple processors within a single system to
exchange data and coordinate their operations . This is crucial for enabling parallel processing, where different tasks
can be executed concurrently on separate processors to improve overall system performance.
Key Concepts:
Multiple Processors:
IPC is primarily relevant in systems with multiple processors, such as multi-core CPUs or distributed systems.
Parallel Processing:
The ability to communicate and coordinate allows tasks to be divided and executed in parallel, leading to faster processing times.
Resource Optimization:
IPC allows different processors to share resources and data, leading to better utilization of available hardware.
Improved Scalability:
IPC enables systems to scale more effectively by adding more processors to handle increasing workloads.
Distributed Systems:
In a network of computers, IPC allows processes on different machines to communicate and work together on a larger task. For
example, a cloud computing platform relies on IPC to manage resources and distribute workloads across multiple servers.
Real-time Systems:
In real-time systems, IPC is essential for coordinating sensors, actuators, and control algorithms across multiple processors to
ensure timely responses.
Key Concepts:
Independent vs. Cooperative Processes:
Processes can be independent, meaning they don't share resources and don't need synchronization, or cooperative, where
they share resources and require synchronization to avoid conflicts.
Shared Resources:
Multiprocessor systems often utilize shared memory, where multiple processors can access the same memory
locations. Synchronization is essential to manage concurrent access to these shared regions.
Race Conditions:
Occur when multiple processes access and modify shared data concurrently, potentially leading to unpredictable and
incorrect results. Synchronization mechanisms are designed to prevent race conditions.
Synchronization Mechanisms:
Mutual Exclusion:
Ensures that only one process can access a shared resource (e.g., a critical section) at a time. This is often implemented
using locks (mutexes).
Locks (Mutexes):
Act as gatekeepers, allowing only one process to acquire the lock at a time. Other processes attempting to acquire the lock
will be blocked until the lock is released.
Semaphores:
A more general synchronization mechanism than locks. They can control access to a limited number of resources. A
semaphore maintains a counter, and processes can increment or decrement it. Processes might be blocked if they try to
decrement the counter when it's already zero.
Barriers:
Used to synchronize a group of processes. All processes must reach the barrier before any of them can proceed
further. This ensures that all processes have completed a specific stage of computation before moving on.
Example:
Consider a scenario with a shared buffer used by multiple producer and consumer processes. Producers add
data to the buffer, and consumers remove data from it. Without synchronization, a producer might try to add
data to a full buffer, or a consumer might try to read from an empty buffer. Using locks or semaphores, the
system can ensure that the buffer is accessed in a controlled manner, preventing data loss or corruption.
Importance:
Data Consistency:
Synchronization is critical for maintaining data consistency when multiple processes access shared resources.
Avoiding Deadlocks:
Improper synchronization can lead to deadlocks, where processes are blocked indefinitely, waiting for resources held by
other blocked processes. Effective synchronization techniques help prevent deadlocks.
Symmetric Shared Memory (SMP): All processors have equal access to memory and resources.
Non-Uniform Memory Access (NUMA): Memory access time varies depending on the location of the
memory relative to the processor.
Memory Bandwidth:
The rate at which data can be transferred between processors and memory.
Interconnection Network:
The topology and performance of the network that connects the processors and memory.
Synchronization Mechanisms:
Techniques used to coordinate access to shared resources and prevent race conditions.
DOI: http://dx.doi.org/10.26483/ijarcs.v8i7.4406
ISSN No. 0976-5697
Volume 8, No. 7, July – August 2017
International Journal of Advanced Research in Computer Science
RESEARCH PAPER
Available Online at www.ijarcs.info
STUDY OF MEMORY ORGANIZATION AND MULTIPROCESSOR SYSTEM -
USING THE CONCEPT OF DISTRIBUTED SHARED MEMORY, MEMORY
CONSISTENCY MODEL AND SOFTWARE BASED DSM
Dhara Kumari Dr. Rajni Sharma
Mphil Scholar Assistant Professor (Computer Science)
Himalayan University PT.J.L.N. Govt P.G College
Arunachal Pradesh (India) Faridabad (India)
[email protected] [email protected]
Dr. Sarita Kaushik
HOD (Computer Science)
DAV College
Faridabad (India)
[email protected]
Abstract: In current trend, performance and efficiency is the big issue of the memory organization and multiprocessor system whereas, A
Memory Organization and Multiprocessor uses multiple modalities to capture different types of DSM (Software based, Hardware Based and it
may be combine both Software & Hardware etc) because IT technology is greatly advanced and lot's of information is shared via the internet. To
improve the performance and efficiency of that multiprocessor system and memory organization, we can use different type of techniques that is
based on the concept and implementation of Hardware, Software, and Hybrid DSM. This paper provides an almost exhaustive survey of the
existing problem and solutions in a uniform manner, presenting their memory organization, shared memory, distributed memory, distributed
shared memory, Memory Consistency Model and software based DSM mechanisms and issues of importance for various DSM systems and
approaches.
Keywords: Performance, Efficiency, Memory, DSM, Shared Memory, Software Based DSM, Multiprocessor System. Memory Consistency
Model
This figure show the number of levels in the memory programming and portability. However, shared-memory
hierarchy differs for different architectures and how all multiprocessors typically suffer from increased contention
goals are achieved at a nominal cost by using multiple levels and longer latencies in accessing the shared memory, which
of memory. The fastest memories are more expensive per bit degrades peak performance and limits scalability compared
than the slower memories and thus are usually smaller. The to distributed systems. Memory system design also tends to
price difference arises because of the difference in the be complex. Thus, in 1986, Kai Li proposed a different
capacity among different implementations for the same scheme in his PhD dissertation entitled, “Shared Virtual
amount of silicon. But programs typically specified the Memory on loosely Coupled Microprocessors”, it opened up
location to write memory and what data to put there. This a new area of research that is known as Distributed Shared
location was a physical location on the actual memory Memory (DSM) systems [5] that support in multiple
hardware. The slow processing of such computers did not computer environments.
allow for the complex memory management systems used
C. Distributed Shared Memory System
today. Also, as most such systems were single-task,
A DSM system logically implements the shared memory
sophisticated systems were not required as much. This
model on a physically distributed-memory system
approach has its pitfalls. If the location specified is
(distributed memory refers to a multiprocessor computer
incorrect, this will cause the computer to write the data to
system in which each processor has its own
some other part of the program. The results of an error like
private memory). So we can say that DSM is a model of
this are unpredictable. In some cases, the incorrect data
inter-process communications in distributed system.
might overwrite memory used by the operating system.
Distributed shared memory (DSM) is also a form of memory
Computer crackers can take advantage of this to
architecture where the (physically separate) memories can
create viruses and malware. So, to prevent this type of
be addressed as one (logically shared) address space
situation we can use the concept of Shared Memory System.
distributed shared memory (DSM) is a form of memory
B. Shared Memory
architecture where the (physically separate) memories can
Shared Memory is an efficient that means passing data
be addressed as one (logically shared) address space. Figure
between programs. One program will create a memory
3 shows the general structure of distributed shared memory
portion which other processes (if permitted) can access. To
system.
realize a shared memory system, it is necessary to avoid
memory access contention, maintain cache coherency [2, 3],
and realize synchronization between computing nodes or
data to enable collaboration. In multiprocessor environment,
Shared memory systems cover a broad spectrum, from a
system that maintain consistency entirely in hardware to
those that do it entirely in software and makes a global
physical memory equally accessible to all processors. This
system also known as tightly coupled multiprocessor [4] that
enable simple data sharing through a uniform mechanism of
reading and writing shared structures in the common
memory. Figure 2 shows the general structures of Bus-based
shared memory system.
According to Hardware implementation, it requires converts high-level language code into machine code
addition of special network interfaces and cache and the hardware that executes this code.
coherence circuits to the system to make remote The computer researcher proposed different memory models
memory access look like local memory access. So, to enhance distributed shared memory systems (like
Hardware DSM is very expensive. sequential consistency model, processor consistency model,
According to Software implementation, a software layer weak consistency model, release consistency model etc.).
is added between the OS and application layers and These models, to increase the memory access latency, the
kernel of OS may or may not be modified. Software bandwidth requirements, and simplify the programming. It
DSM is more widely used as it is cheaper and easier to also provides better performance, at the expense of a higher
implement than Hardware DSM. involvement of the programmer in synchronizing the
Design issues of DSM accesses to shared data.
The distributed shared memory is to present the global view E. Software Based DSM
of the entire address space to a program executing on any A distributed shared memory is a simple yet powerful
machine [6]. A DSM manager on a particular machine paradigm for structuring multiprocessor systems. It can be
would capture all the remote data accesses made by any designed using hardware and/or software methodologies
process running on that machine. An implementation of a based on various considerations of data being shared in
DSM would involve various choices. Some of them are as multiprocessor environments but it is better to design DSM
below [7]. in software because sharing data becomes a problem which
DSM Algorithm has to be easily tackled in software and not in hardware as in
Implementation level of DSM Mechanism multiprocessor systems. The memory organization of a
Semantics for concurrent access software DSM system determines the way shared virtual
Semantics (Replication/Partial/ Full/R/W) memory is organized on top of the isolated memory address
Naming scheme has to be used to access remote data space. There are various advantages of programming
Locations for replication (for optimization) distributed shared memory for multiprocessor environment
System consistency model & Granularity of data as stated below:
Data is replicated or cached Sharing data becomes a problem which has to be
Remote access by HW or SW tackled in the software and not in hardware as in
Caching/replication controlled by HW or SW multiprocessor systems.
The value of distributed shared memory depends upon the Shared memory programs are usually shorter and easier
performance of Memory Consistency Model. The to understand.
consistency model is responsible for managing the state of Large or complex data structures may easily be
shared data for distributed shared memory systems. Lots of communicated.
consistency model defined by a wide variety of source Programming with shared memory is a well-understood
including architecture system, application programmer etc. problem.
D. Memory Consistency Model Shared memory gives transparent process-to-process
Although, shared-memory systems allow multiple communication.
processors to simultaneously read and write the same Compact design and easy implementation and
memory locations and programmers require a conceptual expansion.
model for the semantics of memory operations to allow Software based DSM provide many advantages in design of
them to correctly use the shared memory. This model is multiprocessor systems. A distributed shared memory
generally referred to as a memory consistency model or mechanism allowing user’s multiple processors to access
memory model. So we can say that the memory consistency shared data efficiently. A DSM having no memory access
model for a shared-memory multiprocessor specifies the bottleneck and large virtual memory space can
behavior of memory with respect to read and write accommodate more no of processors. Its programs are
operations from multiple processors. According to the portable as they use common DSM programming interface,
system designer’s point of view, the model specifies but still having some disadvantages like programmers need
acceptable memory behaviors for the system. Thus, the to understand consistency models, to write correct programs.
memory consistency model influences many aspects of So, this study is very useful to build new shared memory
system design, including the design of programming system against designing constraints and other languages.
languages, compilers, and the underlying hardware. Software implementations of the DSM concept are usually
A memory model can be defined at any interface between built as a separate layer on the top of message passing
the programmer and the system whereas the system consists model.
of the base hardware and programmers express their According to the implementation level, several types of
programs in machine-level instructions. There are two type software-oriented DSM approaches can be recognized in
of interface: Figure 4.
At the machine code interface, the memory model
specification affects the designer of the machine
hardware and the programmer who writes or reasons
about machine code.
At the high level language interface, the specification
affects the programmers who use the high level
language and the designers of both the software that
weaknesses that do not increase performance and efficiency [8] S. V. Adve and M. D. Hill. A Unified Formalization of Four
during the implementation of software based distributed Shared-Memory Models. IEEE Trans. on Paralleland
shared memory system. These weaknesses are: Distributed Systems, 4(6):613–624, June 1993.
[9] P. S. Sindhu, J-M. Frailong, and M. Cekleov. Formal
Specification of Memory Models. In M. Dubois and S. S.
There are no high level synchronization primitives Thakkar, editors, Scalable Shared Memory Multiprocessors,
provided. In this case, Programmers have to use pages 25–41. Kluwer Academic Publishers, 1992.
basic synchronization primitives for example, [10] M. Raynal and A. Schiper. A Suite of Formal Definitions for
barriers and locks, to solve synchronization Consistency Criteria in Shared Memories. In Proc. of the 9th
problems. Int’l Conf. on Parallel and Distributed Computing Systems
If many writers write to the page and read the page (PDCS’96), pages 125–131, September 1996.
then current multiple writer protocols suffer from [11] K. Li and P. Hudak. Memory coherence in shared virtual
the high cost of making a stale page current. memory systems. ACM Transactions on Computer Systems,
7(4):321– 359, November 1989.
Thus in future work, these two type of weaknesses can be
[12] K. Li and R. Schaefer. Shiva: An Operating System
solve by using different type of methodology and Transforming A Hypercube into a Shared-Memory Machine.
implementation that to provide a strong guarantee of Technical Report CS-TR-217-89, Dept. of Computer
Performance, Persistence, Interoperability, Security, Science,Princeton University, April 1989.
Resource Management, Scalability, and Fault Tolerance [13] J. B. Carter, J. K. Bennett, and W. Zwaenepoel. Techniques
during the read and write operation onto memory for reducing consistency-related communication in
organization and multiprocessor system. distributed shared memory systems. ACM Transactions on
Computer Systems, 13(3):205–243, August 1995.
V REFERENCES [14] J. B. Carter. Design of the Munin distributed shared memory
system. Journal of Parallel and Distributed Computing on
Distributed Shared Memory, 1995.
[1] Stanek, William R. (2009). Windows Server 2008 Inside
[15] P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel.
Out. O'Reilly Media, Inc. p. 1520. ISBN 978-07356-3806-8.
An Evaluation of Software-Based Release Consistent
Retrieved 2012-08-20. [...] Windows Server Enterprise
Protocols. Journal of Parallel and Distributed Computing,
supports clustering with up to eight-node clusters and very
29(2):126–141, September 1995.
large memory (VLM) configurations of up to 32 GB on 32-
[16] W. G. Levelt, M. F. Kaashoek, H. E. Bal, and A. S.
bit systems and 2 TB on 64-bit systems.
Tanenbaum. A Comparison of Two Paradigms for
[2] H. Amano, Parallel Computer. Shoukoudou, June 1996
Distributed Shared Memory. Software—Practice and
[3] N. Suzuki, S. Shimizu, and N. Yamanouchi, An
Experience, 22(11):985–1010, November 1992. Also
Implemantation of a Shared Memory Multiprocessor.
available as Free University of the Netherlands, Computer
Koronasha, Mar. 1993.
Science Department technical report IR-221.
[4] M. J. Flynn, Computer Architecture: Pipelined and Parallel
[17] X-H. Sun and J. Zhu. Performance Considerations of Shared
Processor Design, Jones and Barlett, Boston, 1995.
Virtual Memory Machines. IEEE Trans. OnParallel and
[5] Kai Li, “Shared Virtual Memory on Loosely Coupled
Distributed Systems, 6(11):1185–1194, November 1995.
Microprocessors” PhD Thesis, Yale University, September
[18] R. G. Minnich and D. V. Pryor. A Radiative Heat Transfer
1986.
Simulation on a SPARCStation Farm. In Proc. of the First
[6] Song Li, Yu Lin, and Michael Walker, “Region-based
IEEE Int’l Symp. on High Performance Distributed
Software Distributed Shared Memory,” CS 656 Operating
Computing (HPDC-1), pages 124–132, September 1992.
Systems, May 5, 2000.
[19] P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel.
[7] Ajay Kshemkalyani and Mukesh Singhal, Ch12: Distributed
TreadMarks: Distributed shared memory on standard
Computing: Principles, Algorithms, and Systems, Cambridge
workstations and operating systems. In the 1994 Winter
University Press, CUP 2008.
USENIX Conference, 1994.
Overlapping execution:
Instead of waiting for one instruction to complete all stages before starting the next, each stage works on a
different instruction concurrently.
Analogy:
Imagine an assembly line in a factory. Each station (stage) performs a specific task on a product (instruction),
and multiple products are processed simultaneously as they move down the line.
Benefits of Pipelining:
Increased throughput:
Pipelining significantly increases the number of instructions completed per unit of time, leading to faster
execution of programs.
Data hazards:
Occur when an instruction depends on the result of a previous instruction that is still in the pipeline (e.g., an
instruction needs to read a value that hasn't been written back yet).
Control hazards:
Occur when a branch instruction changes the normal flow of instructions, causing the pipeline to potentially
execute the wrong instructions.
Branch prediction: Predicting the outcome of branch instructions to avoid unnecessary stalls.
Pipelining is a crucial concept in modern processor design, enabling significant performance gains
by allowing parallel execution of instructions.
What is Pipelining?
Pipelining is the process of accumulating instruction from the processor through a pipeline. It
allows storing and executing instructions in an orderly process. It is also known as pipeline
processing.
Before moving forward with pipelining, check these topics out to understand the concept better :
Memory Organization
Memory Mapping and Virtual Memory
Parallel Processing
Pipelining is a technique where multiple instructions are overlapped during execution. Pipeline is
divided into stages and these stages are connected with one another to form a pipe like structure.
Instructions enter from one end and exit from another end.
In pipeline system, each segment consists of an input register followed by a combinational circuit.
The register is used to hold data and combinational circuit performs operations on it. The output
of combinational circuit is applied to the input register of the next segment.
Pipeline system is like the modern day assembly line setup in factories. For example in a car
manufacturing industry, huge assembly lines are setup and at each point, there are robotic arms
to perform a certain task, and then the car moves on ahead to the next arm.
Types of Pipeline
It is divided into 2 categories:
1. Arithmetic Pipeline
2. Instruction Pipeline
Arithmetic Pipeline
Arithmetic pipelines are usually found in most of the computers. They are used for floating point
operations, multiplication of fixed point numbers etc. For example: The input to the Floating Point
Adder pipeline is:
X = A*2^a
Y = B*2^b
Here A and B are mantissas (significant digit of floating point numbers), while a and b are
exponents.
Registers are used for storing the intermediate results between the above operations.
Instruction Pipeline
In this a stream of instructions can be executed by overlapping fetch, decode and execute phases
of an instruction cycle. This type of technique is used to increase the throughput of the computer
system.
An instruction pipeline reads instruction from the memory while previous instructions are being
executed in other segments of the pipeline. Thus we can execute multiple instructions
simultaneously. The pipeline will be more efficient if the instruction cycle is divided into segments
of equal duration.
Pipeline Conflicts
There are some factors that cause the pipeline to deviate its normal performance. Some of these
factors are given below:
1. Timing Variations
All stages cannot take same amount of time. This problem generally occurs in instruction
processing where different instructions have different operand requirements and thus different
processing time.
2. Data Hazards
When several instructions are in partial execution, and if they reference same data then the
problem arises. We must ensure that next instruction does not attempt to access data before the
current instruction, because this will lead to incorrect results.
3. Branching
In order to fetch and execute the next instruction, we must know what that instruction is. If the
present instruction is a conditional branch, and its result will lead us to the next instruction, then
the next instruction may not be known until the current one is processed.
4. Interrupts
Interrupts set unwanted instruction into the instruction stream. Interrupts effect the execution of
instruction.
5. Data Dependency
It arises when an instruction depends upon the result of a previous instruction but this result is
not yet available.
Advantages of Pipelining
1. The cycle time of the processor is reduced.
Disadvantages of Pipelining
1. The design of pipelined processor is complex and costly to manufacture.
Key Concepts:
Vectors:
In vector processing, data is organized into vectors, which are essentially one-
dimensional arrays of data items.
SIMD Instructions:
Vector processors use special instructions (SIMD instructions) that can operate on
multiple data elements in parallel.
Parallelism:
The core idea is to exploit parallelism by performing the same operation on multiple
data points at the same time, which significantly speeds up computation.
Vector Registers:
Vector processors typically have specialized registers (vector registers) that can hold
entire vectors, allowing for efficient parallel operations.
How it Works:
1. Data Organization: Data is loaded into vector registers.
2. Instruction Execution: A single instruction (e.g., add, multiply) is executed on all
elements of the vector registers simultaneously.
3. Results Storage: The results of the operations are stored back into vector registers
or memory.
Examples:
Scientific Computing:
Solving large systems of equations, simulating physical phenomena, and performing
complex calculations.
Image Processing:
Applying filters, transformations, and other operations to images represented as
matrices or arrays of pixels.
Machine Learning:
Training deep learning models, where operations are often performed on large matrices
and tensors.
Vector processor classification
Last Updated : 11 Jul, 2025
Vector processors have rightfully come into prominence when it comes to designing computing
architecture by virtue of how they handle large datasets efficiently. A large portion of this
efficiency is due to the retrieval from architectural configurations used in the implementation.
Vector processors are classified into two primary architectures: memory to memory and
register to register. These classification are important to optimize performance on the scientific
computing and other data intensive applications.
In memory-to-memory architecture, source operands, intermediate results, and final results are
retrieved directly from the main memory. For memory-to-memory vector instructions, it is
necessary to specify the base address, offset, increment, and vector length to facilitate data
transfers between the main memory and pipelines. Notable processors employing memory-to-
memory formats include TI-ASC, CDC STAR-100, and Cyber-205.
In register-to-register architecture, operands and results are retrieved indirectly from the main
memory through the use of a large number of vector or scalar registers. Processors like Cray-1
and Fujitsu VP-200 utilize vector instructions in register-to-register formats.
Hybrid Architecture
Hybrid Architecture unites memory-to-memory and register-to-register architectures to gain
from all. The way is that of flexible operand retrieval methods; which improves performance
and efficiency in many computational tasks. This solution provides a balanced solution which
can be adapted to requirements of the application, with a better utilization of available
resources.
Key Concepts:
Parallel Processing:
Array processors leverage parallelism by distributing computations across multiple processing elements (PEs), enabling simultaneous
execution of operations on different data elements.
SIMD Architecture:
A common approach is Single Instruction, Multiple Data (SIMD), where a single instruction is executed on multiple data elements
concurrently.
Processing Elements (PEs):
These are individual processing units within the array processor, each capable of performing computations on a portion of the data.
Control Unit (CU):
The CU coordinates the activities of the PEs, issuing instructions and managing data flow.
Types of Array Processors:
Attached Array Processors: These are auxiliary processors connected to a general-purpose host computer, often used to accelerate
specific computational tasks.
SIMD Array Processors: These processors have a more tightly integrated structure, with identical processing elements operating
synchronously under the control of a single CU.
How it works:
1. Data Decomposition:
The input data array is divided into smaller chunks, with each chunk assigned to a separate PE.
2. Instruction Issuance:
The control unit issues a single instruction that is then broadcast to all active PEs.
3. Parallel Execution:
Each PE independently executes the instruction on its assigned data chunk.
4. Result Gathering:
The results from each PE are then gathered and combined to produce the final output.
Benefits:
Increased Processing Speed:
Parallel execution significantly reduces the overall processing time for computationally intensive tasks.
Improved Efficiency:
Array processors are designed to optimize for repetitive calculations, making them faster than general-purpose processors for these
workloads.
Enhanced Performance:
By leveraging parallelism, array processors can handle large datasets more efficiently, leading to improved system performance.
Examples:
Scientific Simulations:
Simulating weather patterns, molecular dynamics, or financial markets often involves extensive matrix operations, which can be
accelerated by array processors.
Image and Signal Processing:
Applications like medical imaging, speech recognition, and seismic data analysis rely on array processing for tasks such as filtering,
transformations, and feature extraction.
Graphics Processing:
Vector processors, a type of array processor, are crucial for handling the parallel computations required for rendering graphics in video
games and other applications.
Types of Array Processor
Last Updated : 03 May, 2024
Array Processor performs computations on large array of data. These are two types of Array Processors: Attached Array
Processor, and SIMD Array Processor. These are explained as following below.
Here local memory interconnects main memory. Host computer is general purpose computer. Attached processor is back
end machine driven by the host computer.
The array processor is connected through an I/O controller to the computer & the computer treats it as an external
interface.
Working:
Let us assume that we are executing vast number of instructions, in that case it is not possible to execute all instructions
with the help of host computer. Sometimes it may take days of weeks to execute these vast number of introductions. So
in order to enhance the speed and performance of the host computer as shown in above diagram.
I/o interface is used to connect and resolve the difference between host and attached process. Attached array processor
is normally used to enhance the performance of the host computer . Array processor mean bunch/group of process used
together to perform any computation.
The processing units are synchronized to perform the same operation under the control of a common control unit. Thus
providing a single instruction stream, multiple data stream (SIMD) organization. As shown in figure, SIMD contains a set
of identical processing elements (PES) each having a local memory M.
Working:
Here array processor is in built into the computer unlike in attached array processor where array processor is attached
externally to the host computer. Initially mark control unit decodes the instruction and generate the control signals and
passes it into all the processor elements(PE's) or ALU upon receiving the control signals from master control unit, all the
processing elements come to throw that operations need to perform. The data perform the operations will be accessed
from main memory into the local memory of respective PE's. SIMD array processor is normally used to compute vector
data. All PE's execute same instruction simultaneously on different data.
Here the vector instruction Ci= Ai+ Bi need to be executed is addition operation, the master control unit generate control
signal and passes it onto all processing elements and data.
parallelly on same instruction on with different data, that's why it is called as Single instruction multiple data
streams(SIMD) array processor.
Each PE includes -
ALU
Floating point arithmetic unit
Working registers
Master control unit controls the operation in the PEs. The function of master control unit is to decode the instruction and
determine how the instruction to be executed. If the instruction is scalar or program control instruction then it is directly
executed within the master control unit.
Main memory is used for storage of the program while each PE uses operands stored in its local memory.
RISC and CISC in Computer Organization
Last Updated : 11 Jul, 2025
RISC is the way to make hardware simpler whereas CISC is the single instruction that handles multiple work. In this
article, we are going to discuss RISC and CISC in detail as well as the Difference between RISC and CISC, Let's
proceed with RISC first.
Characteristics of RISC
Advantages of RISC
Simpler instructions: RISC processors use a smaller set of simple instructions, which makes them easier to
decode and execute quickly. This results in faster processing times.
Faster execution: Because RISC processors have a simpler instruction set, they can execute instructions faster
than CISC processors.
Lower power consumption: RISC processors consume less power than CISC processors, making them ideal
for portable devices.
Disadvantages of RISC
More instructions required: RISC processors require more instructions to perform complex tasks than CISC
processors.
Increased memory usage: RISC processors require more memory to store the additional instructions needed
to perform complex tasks.
Higher cost: Developing and manufacturing RISC processors can be more expensive than CISC processors.
Characteristics of CISC
Advantages of CISC
Reduced code size: CISC processors use complex instructions that can perform multiple operations, reducing
the amount of code needed to perform a task.
More memory efficient: Because CISC instructions are more complex, they require fewer instructions to
perform complex tasks, which can result in more memory-efficient code.
Widely used: CISC processors have been in use for a longer time than RISC processors, so they have a larger
user base and more available software.
Disadvantages of CISC
Slower execution: CISC processors take longer to execute instructions because they have more complex
instructions and need more time to decode them.
More complex design: CISC processors have more complex instruction sets, which makes them more difficult
to design and manufacture.
Higher power consumption: CISC processors consume more power than RISC processors because of their
more complex instruction sets.
CPU Performance
Both approaches try to increase the CPU performance
RISC: Reduce the cycles per instruction at the cost of the number of instructions per program.
CPU Time
CISC: The CISC approach attempts to minimize the number of instructions per program but at the cost of an
increase in the number of cycles per instruction.
Earlier when programming was done using assembly language, a need was felt to make instruction do more tasks
because programming in assembly was tedious and error-prone due to which CISC architecture evolved but with the
uprise of high-level language dependency on assembly reduced RISC architecture prevailed.
Example:
CISC approach: There will be a single command or instruction for this like ADD which will perform the task.
RISC approach: Here programmer will write the first load command to load data in registers then it will use a
suitable operator and then it will store the result in the desired location.
So, add operation is divided into parts i.e. load, operate, store due to which RISC programs are longer and require
more memory to get stored but require fewer transistors due to less complex command.
R ISC vs CISC
RISC CISC
Uses only Hardwired control unit Uses both hardwired and microprogrammed control unit
Can perform only Register to Register Arithmetic Can perform REG to REG or REG to MEM or MEM to
operations MEM
An instruction executed in a single clock cycle Instruction takes more than one clock cycle
An instruction fit in one word. Instructions are larger than the size of one word
Simple and limited addressing modes. Complex and more addressing modes.
The number of instructions are less as compared to CISC. The number of instructions are more as compared to RISC.
Here, Addressing modes are less. Here, Addressing modes are more.
Multicore processors, with multiple processing units on a single chip , are fundamental to modern
computing, offered by both Intel and AMD. These processors enhance performance and multitasking
capabilities by allowing concurrent execution of instructions. Both companies offer a range of multicore
processors for various applications, from personal computers to servers, with different strengths and
specializations.
Intel:
Core i Series:
Intel's Core i3, i5, i7, and i9 processors are widely known for their single-core performance and broad software
compatibility.
Strengths:
Intel processors are often favored for tasks requiring high single-core speed, making them a reliable choice for
general computing and many applications.
Multicore Architecture:
Intel's multicore processors feature multiple execution pipelines and resources within each core, enabling
simultaneous task execution.
AMD:
Ryzen Series:
AMD's Ryzen 3, 5, 7, and 9 processors, along with Threadripper, are known for their strong multi-core performance
and integrated graphics capabilities.
Strengths:
AMD processors are often lauded for their price-to-performance ratio, offering more cores and cache for the same
or lower cost than comparable Intel processors.
Multicore Advantages:
AMD processors are well-suited for tasks involving heavy workloads, such as content creation, video editing, and
gaming, where parallel processing is crucial.
Key Differences:
Performance Focus:
Intel often emphasizes single-core speed, while AMD is known for its multicore performance and competitive
pricing.
Integrated Graphics:
AMD's integrated graphics are often more powerful than Intel's, potentially saving costs on discrete graphics
cards.
Cache:
AMD processors may have more L3 cache, while Intel processors can have more L2 cache.
In essence:
Both Intel and AMD offer excellent multicore processors, but their strengths lie in different areas.
AMD offers more cores, better value, and stronger integrated graphics.
Choosing between the two depends on the specific needs and priorities of the user.
Difference Between Intel and AMD
Intel AMD
Clock speed reaches and The clock speed can reach 5.0 GHz but
surpassed 5.0 GHz results in more heat