0% found this document useful (0 votes)

7 views27 pages

Unit 5 Part 2

Uploaded by

shubhrajkumar707

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views27 pages

Unit 5 Part 2

Uploaded by

shubhrajkumar707

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Multiprocessor systems are characterized by multiple CPUs sharing resources like memory and

I/O equipment to enhance performance, reliability, and scalability . They enable parallel
processing, allowing for faster execution of tasks and better multitasking
capabilities. Multiprocessors can be categorized based on memory organization (shared vs.
distributed) and processor interaction (symmetric vs. asymmetric).

Key Characteristics of Multiprocessor Systems:

Parallel Processing:
Multiple processors work concurrently on different tasks, significantly boosting computational speed and
throughput.
Shared Memory:
Processors can access a common memory space, enabling efficient data sharing and communication.
Increased Reliability:
If one processor fails, others can take over its functions, enhancing overall system reliability and fault
tolerance.
Scalability:
Multiprocessors can be scaled up by adding more processors to handle increasing workloads.
Improved Throughput:
By dividing tasks and processing them in parallel, multiprocessors achieve higher throughput compared to
single-processor systems.
Resource Sharing:
Multiple processors can share resources like memory, I/O devices, and even peripherals, leading to cost-
effectiveness.
Complex Operating Systems:
Managing multiple processors requires a more sophisticated operating system to handle scheduling,
resource allocation, and synchronization.

Types of Multiprocessor Systems:

Symmetric Multiprocessors (SMP):
All processors have equal access to resources and operate in a peer-to-peer manner.
Asymmetric Multiprocessors:
One processor acts as a master, controlling the other processors (slaves), which may have specialized
roles.

Examples of Multiprocessor Systems:

Clusters: Multiple independent computers connected to form a larger system.

Supercomputers: Systems with a massive number of processors designed for high-performance
computing.
Cloud Computing: Services that leverage multiprocessor systems to provide on-demand computing
resources.
Interprocessor arbitration in multiprocessor systems manages access to shared resources like the system bus, ensuring
that only one processor can use it at a time to prevent conflicts. This is crucial for maintaining system integrity and
efficient operation. Different arbitration schemes exist, including serial (daisy chain) and parallel arbitration , each with its
own advantages and disadvantages.

Here's a breakdown of the key aspects:

1. Shared Resources:
System Bus:
The primary communication pathway connecting processors, memory, and I/O devices.

Memory:
Access to shared memory locations needs to be arbitrated to prevent data corruption.

Other Peripherals:
Access to shared peripherals (e.g., network interfaces, storage devices) also requires arbitration.

2. Arbitration Mechanisms:
Serial (Daisy Chain) Arbitration:
Requests propagate sequentially through a chain of arbitration circuits. The highest priority device gets the bus first. If it doesn't need
it, the request passes to the next in line.

Parallel Arbitration:
Multiple requests are handled simultaneously using an external priority encoder and decoder. This allows for faster arbitration
compared to the daisy chain method.

Dynamic Arbitration:
Schemes like round-robin or prioritizing the longest waiting request are used to dynamically adjust priorities and ensure fairness.

3. Key Concepts:
Bus Request (BRQ):
A signal from a processor indicating it wants to access the shared resource.

Bus Grant (BG):

A signal from the arbitration logic granting access to the bus to a requesting processor.

Busy Signal:
Indicates that the bus is currently in use.

Priority:
Determines which processor gets access when multiple requests occur simultaneously.

4. Why Arbitration is Important:

Preventing Data Corruption: Ensures that only one processor writes to a memory location at a time.
Maintaining System Integrity: Prevents conflicts and ensures that the system operates correctly.

Optimizing Resource Utilization: Allows for efficient sharing of resources among multiple processors.

5. Types of Multiprocessor Systems:

Shared Memory Multiprocessors: Processors share a common memory space.
Distributed Memory Multicomputers: Processors have their own private memory and communicate through message
passing.

6. Considerations for Choosing an Arbitration Scheme:

System Performance: Parallel arbitration is generally faster than serial arbitration.

Complexity: Serial arbitration is simpler to implement than dynamic arbitration.

Scalability: The chosen scheme should be able to handle the number of processors in the system.
Inter-Processor Communication (IPC) and Synchronization refers to the mechanisms that
allow multiple processors (or processes) in a system to communicate with each other and
coordinate their actions , respectively. IPC enables processes to exchange data and
information, while synchronization ensures that these processes execute in a controlled and
predictable manner, preventing conflicts when accessing shared resources.

Inter-Processor Communication (IPC):

Definition:
IPC is the process of exchanging data and information between multiple processors or processes.

Purpose:
It allows different parts of a system, whether on the same machine or across a network, to cooperate
and share resources.

Methods:
IPC mechanisms include shared memory, message passing (e.g., pipes, message queues), sockets,
and remote procedure calls (RPC).

Example:
In a web server, different processes might handle user requests, database interactions, and file
storage. IPC allows them to communicate and coordinate their tasks.

Synchronization:
Definition:
Synchronization ensures that multiple processes access shared resources in a controlled way,
preventing data corruption or inconsistencies.

Purpose:
It prevents race conditions (where multiple processes try to modify the same data simultaneously) and
ensures data integrity.

Methods:
Synchronization techniques include semaphores, mutexes, monitors, and barriers.

Example:
If multiple processes are writing to the same file, synchronization mechanisms ensure that only one
process writes at a time, preventing garbled or incomplete data.

Relationship between IPC and Synchronization:

IPC and synchronization are closely related. IPC provides the means for processes to
communicate, while synchronization ensures that this communication happens in an orderly
and reliable manner. Often, synchronization is necessary when using IPC mechanisms like
shared memory, where multiple processes need to access the same data.

In essence, IPC is about what is being communicated, and synchronization is about how it is
communicated.
Inter-Processor Communication (IPC) refers to the mechanisms that allow multiple processors within a single system to
exchange data and coordinate their operations . This is crucial for enabling parallel processing, where different tasks
can be executed concurrently on separate processors to improve overall system performance.

Here's a more detailed explanation:

Key Concepts:
Multiple Processors:
IPC is primarily relevant in systems with multiple processors, such as multi-core CPUs or distributed systems.

Data Sharing and Coordination:

IPC provides methods for processors to exchange information (data) and synchronize their actions.

Parallel Processing:
The ability to communicate and coordinate allows tasks to be divided and executed in parallel, leading to faster processing times.

Inter-Process Communication vs. Inter-Processor Communication:

While both involve communication between different parts of a system, IPC specifically focuses on communication between
processors, whereas inter-process communication (IPC) can refer to communication between any processes, even those on the
same processor.

Why is IPC Important?

Increased Performance:
By distributing tasks across multiple processors and enabling them to communicate, IPC significantly speeds up processing
compared to a single processor handling everything sequentially.

Resource Optimization:
IPC allows different processors to share resources and data, leading to better utilization of available hardware.

Improved Scalability:
IPC enables systems to scale more effectively by adding more processors to handle increasing workloads.

Complex Application Support:

Many complex applications, such as those in scientific computing, data processing, and high-performance computing, rely on IPC to
manage the large datasets and computations involved.

Examples of IPC in Action:

Multi-core Processors:
On a multi-core CPU, each core can act as a separate processor. IPC mechanisms enable these cores to share data and coordinate
tasks like rendering graphics, processing user input, or running background tasks.

Distributed Systems:
In a network of computers, IPC allows processes on different machines to communicate and work together on a larger task. For
example, a cloud computing platform relies on IPC to manage resources and distribute workloads across multiple servers.

Real-time Systems:
In real-time systems, IPC is essential for coordinating sensors, actuators, and control algorithms across multiple processors to
ensure timely responses.

Common IPC Mechanisms:

Shared Memory: Allows multiple processors to access the same region of memory, enabling direct data exchange.
Message Passing: Involves sending data packets between processors using communication channels.
Remote Procedure Calls (RPC): Allows a processor to execute a procedure on another processor as if it were a local function
call.
Semaphores and Mutexes: Provide mechanisms for synchronizing access to shared resources and preventing data corruption.
Mailboxes: Specialized memory areas used for asynchronous message passing.
Interrupts: Hardware mechanisms for signaling events between processors.
Synchronization in multiprocessor systems is crucial for coordinating the execution of multiple processes or
threads that share resources . It ensures data consistency and prevents race conditions when multiple
processors access shared memory or other resources concurrently. Synchronization mechanisms like locks,
semaphores, and barriers are employed to control access to critical sections and manage dependencies
between concurrent tasks.

Key Concepts:
Independent vs. Cooperative Processes:
Processes can be independent, meaning they don't share resources and don't need synchronization, or cooperative, where
they share resources and require synchronization to avoid conflicts.

Shared Resources:
Multiprocessor systems often utilize shared memory, where multiple processors can access the same memory
locations. Synchronization is essential to manage concurrent access to these shared regions.

Race Conditions:
Occur when multiple processes access and modify shared data concurrently, potentially leading to unpredictable and
incorrect results. Synchronization mechanisms are designed to prevent race conditions.

Synchronization Mechanisms:
Mutual Exclusion:
Ensures that only one process can access a shared resource (e.g., a critical section) at a time. This is often implemented
using locks (mutexes).

Locks (Mutexes):
Act as gatekeepers, allowing only one process to acquire the lock at a time. Other processes attempting to acquire the lock
will be blocked until the lock is released.

Semaphores:
A more general synchronization mechanism than locks. They can control access to a limited number of resources. A
semaphore maintains a counter, and processes can increment or decrement it. Processes might be blocked if they try to
decrement the counter when it's already zero.

Barriers:
Used to synchronize a group of processes. All processes must reach the barrier before any of them can proceed
further. This ensures that all processes have completed a specific stage of computation before moving on.

Example:
Consider a scenario with a shared buffer used by multiple producer and consumer processes. Producers add
data to the buffer, and consumers remove data from it. Without synchronization, a producer might try to add
data to a full buffer, or a consumer might try to read from an empty buffer. Using locks or semaphores, the
system can ensure that the buffer is accessed in a controlled manner, preventing data loss or corruption.

Importance:
Data Consistency:
Synchronization is critical for maintaining data consistency when multiple processes access shared resources.

Avoiding Deadlocks:
Improper synchronization can lead to deadlocks, where processes are blocked indefinitely, waiting for resources held by
other blocked processes. Effective synchronization techniques help prevent deadlocks.

Efficient Resource Utilization:

Synchronization mechanisms can be designed to minimize overhead and ensure efficient use of shared resources in a
multiprocessor environment.
In multiprocessor computer architecture, memory systems are designed to allow multiple
processors to access and share memory resources . There are two main types of memory
organization: shared memory and distributed memory. Shared memory systems have a single
address space accessible by all processors, while distributed memory systems have each
processor with its own local memory, requiring message passing for inter-processor
communication.

Shared Memory Multiprocessors:

Concept: All processors share a single, common address space.
Access: Processors can access any memory location directly.
Communication: Communication between processors is implicit through shared memory locations.
Examples: Tightly coupled systems, multicore processors.
Advantages: Simpler programming model, efficient data sharing.
Disadvantages: Potential for memory contention (multiple processors accessing the same memory
location simultaneously).

Distributed Memory Multiprocessors:

Concept: Each processor has its own private memory.
Access: Processors can only access their local memory directly.
Communication: Inter-processor communication is achieved through message passing.
Examples: Clusters of workstations.
Advantages: Scalability (easier to add more processors), lower memory contention.

Disadvantages: More complex programming model, communication overhead.

Types of Shared Memory Systems:

Symmetric Shared Memory (SMP): All processors have equal access to memory and resources.
Non-Uniform Memory Access (NUMA): Memory access time varies depending on the location of the
memory relative to the processor.

Key Considerations in Multiprocessor Memory Systems:

Cache Coherence:
Ensuring that all processors have a consistent view of the shared data in their caches.

Memory Bandwidth:
The rate at which data can be transferred between processors and memory.

Interconnection Network:
The topology and performance of the network that connects the processors and memory.

Synchronization Mechanisms:
Techniques used to coordinate access to shared resources and prevent race conditions.
DOI: http://dx.doi.org/10.26483/ijarcs.v8i7.4406
ISSN No. 0976-5697
Volume 8, No. 7, July – August 2017
International Journal of Advanced Research in Computer Science
RESEARCH PAPER
Available Online at www.ijarcs.info
STUDY OF MEMORY ORGANIZATION AND MULTIPROCESSOR SYSTEM -
USING THE CONCEPT OF DISTRIBUTED SHARED MEMORY, MEMORY
CONSISTENCY MODEL AND SOFTWARE BASED DSM
Dhara Kumari Dr. Rajni Sharma
Mphil Scholar Assistant Professor (Computer Science)
Himalayan University PT.J.L.N. Govt P.G College
Arunachal Pradesh (India) Faridabad (India)
[email protected] [email protected]
Dr. Sarita Kaushik
HOD (Computer Science)
DAV College
Faridabad (India)
[email protected]

Abstract: In current trend, performance and efficiency is the big issue of the memory organization and multiprocessor system whereas, A
Memory Organization and Multiprocessor uses multiple modalities to capture different types of DSM (Software based, Hardware Based and it
may be combine both Software & Hardware etc) because IT technology is greatly advanced and lot's of information is shared via the internet. To
improve the performance and efficiency of that multiprocessor system and memory organization, we can use different type of techniques that is
based on the concept and implementation of Hardware, Software, and Hybrid DSM. This paper provides an almost exhaustive survey of the
existing problem and solutions in a uniform manner, presenting their memory organization, shared memory, distributed memory, distributed
shared memory, Memory Consistency Model and software based DSM mechanisms and issues of importance for various DSM systems and
approaches.

Keywords: Performance, Efficiency, Memory, DSM, Shared Memory, Software Based DSM, Multiprocessor System. Memory Consistency
Model

I INTRODUCTION onto a hard disk. Developments in technology and

economies of scale have made possible so-called Very
Traditionally, a major progress was recently made in the Large Memory (VLM) computers. [1] Every computer
research and development of systems with multiple comes with a certain amount of physical memory, used with
processors that are capable of delivering high computing reference to computers generally refers to Random Access
power satisfying the constantly increasing demands of Memory or RAM. You can think of main memory as
typical applications. It is more important to optimize the an array of boxes, each of which can hold a single byte of
distributed system features to obtain the maximum possible information. A computer that has 1 megabyte of memory,
performance and efficiency. Systems with multiple therefore, can hold about 1 million bytes (or characters) of
processors are usually classified into different groups, information. A typical memory hierarchy for optimal
according to their memory system organization, shared performance is implemented in different levels with each
memory, distributed memory, distributed shared memory, level having higher speed, smaller size and lower latency
Memory Consistency Model and software based DSM. closer to the processor than lower levels shown in figure
1.
A. Memory Organization
In recent years, power-efficiency has become a major design
factor in systems. This trend is fueled by the ever-growing
use of battery-powered hand-held devices on the one end,
and large-scale data centers on the other end. To ensure high
power-efficiency, all the resources in system (e.g.,
processor, caches, memory) must be used efficiently. So
memory is internal storage areas in the computer system.
The term memory identifies data storage that comes in the
form of chips, and the word storage is used for memory that
exists on tapes or disks. Moreover, the term memory is
usually used as shorthand for physical memory, which refers
to the actual chips capable of holding data. Some computers
also use virtual memory, which expands physical memory Figure 1: Memory Hierarchy

© 2015-19, IJARCS All Rights Reserved 1069

Dhara Kumari et al, International Journal of Advanced Research in Computer Science, 8 (7), July-August 2017,1069-1073

This figure show the number of levels in the memory programming and portability. However, shared-memory
hierarchy differs for different architectures and how all multiprocessors typically suffer from increased contention
goals are achieved at a nominal cost by using multiple levels and longer latencies in accessing the shared memory, which
of memory. The fastest memories are more expensive per bit degrades peak performance and limits scalability compared
than the slower memories and thus are usually smaller. The to distributed systems. Memory system design also tends to
price difference arises because of the difference in the be complex. Thus, in 1986, Kai Li proposed a different
capacity among different implementations for the same scheme in his PhD dissertation entitled, “Shared Virtual
amount of silicon. But programs typically specified the Memory on loosely Coupled Microprocessors”, it opened up
location to write memory and what data to put there. This a new area of research that is known as Distributed Shared
location was a physical location on the actual memory Memory (DSM) systems [5] that support in multiple
hardware. The slow processing of such computers did not computer environments.
allow for the complex memory management systems used
C. Distributed Shared Memory System
today. Also, as most such systems were single-task,
A DSM system logically implements the shared memory
sophisticated systems were not required as much. This
model on a physically distributed-memory system
approach has its pitfalls. If the location specified is
(distributed memory refers to a multiprocessor computer
incorrect, this will cause the computer to write the data to
system in which each processor has its own
some other part of the program. The results of an error like
private memory). So we can say that DSM is a model of
this are unpredictable. In some cases, the incorrect data
inter-process communications in distributed system.
might overwrite memory used by the operating system.
Distributed shared memory (DSM) is also a form of memory
Computer crackers can take advantage of this to
architecture where the (physically separate) memories can
create viruses and malware. So, to prevent this type of
be addressed as one (logically shared) address space
situation we can use the concept of Shared Memory System.
distributed shared memory (DSM) is a form of memory
B. Shared Memory
architecture where the (physically separate) memories can
Shared Memory is an efficient that means passing data
be addressed as one (logically shared) address space. Figure
between programs. One program will create a memory
3 shows the general structure of distributed shared memory
portion which other processes (if permitted) can access. To
system.
realize a shared memory system, it is necessary to avoid
memory access contention, maintain cache coherency [2, 3],
and realize synchronization between computing nodes or
data to enable collaboration. In multiprocessor environment,
Shared memory systems cover a broad spectrum, from a
system that maintain consistency entirely in hardware to
those that do it entirely in software and makes a global
physical memory equally accessible to all processors. This
system also known as tightly coupled multiprocessor [4] that
enable simple data sharing through a uniform mechanism of
reading and writing shared structures in the common
memory. Figure 2 shows the general structures of Bus-based
shared memory system.

Figure 3: distributed shared Memory System

In this figure 3, distributed-memory, is not symmetric. A

scalable interconnect is located between processing nodes or
data, but each node or data has its own local portion of the
global main memory to which it has faster access. During
this step, processes running on separate hosts can access a
shared address space. The underlying DSM system provides
its clients with a shared, coherent memory address space.
Each client can access any memory location in the shared
address space at any time and see the value last written by
any client. So the main advantage of DSM is the simpler
Figure 2: Bus-based shared memory abstraction it provides to the application programmer.
According to the bus-based shared memory approach
(Figure 2), the interconnect is a shared bus located between IMPLEMENTATION OF DISTRIBUTED SHARED
the processor's cache hierarchies and the shared main MEMORY
memory subsystem This approach has been widely used for
small to medium-scale multiprocessors consisting of up to DSM can be implemented in hardware DSM as well as
20 or 30 processors. This system has advantages of ease of software DSM.

© 2015-19, IJARCS All Rights Reserved 1070

Dhara Kumari et al, International Journal of Advanced Research in Computer Science, 8 (7), July-August 2017,1069-1073

 According to Hardware implementation, it requires converts high-level language code into machine code
addition of special network interfaces and cache and the hardware that executes this code.
coherence circuits to the system to make remote The computer researcher proposed different memory models
memory access look like local memory access. So, to enhance distributed shared memory systems (like
Hardware DSM is very expensive. sequential consistency model, processor consistency model,
 According to Software implementation, a software layer weak consistency model, release consistency model etc.).
is added between the OS and application layers and These models, to increase the memory access latency, the
kernel of OS may or may not be modified. Software bandwidth requirements, and simplify the programming. It
DSM is more widely used as it is cheaper and easier to also provides better performance, at the expense of a higher
implement than Hardware DSM. involvement of the programmer in synchronizing the
Design issues of DSM accesses to shared data.
The distributed shared memory is to present the global view E. Software Based DSM
of the entire address space to a program executing on any A distributed shared memory is a simple yet powerful
machine [6]. A DSM manager on a particular machine paradigm for structuring multiprocessor systems. It can be
would capture all the remote data accesses made by any designed using hardware and/or software methodologies
process running on that machine. An implementation of a based on various considerations of data being shared in
DSM would involve various choices. Some of them are as multiprocessor environments but it is better to design DSM
below [7]. in software because sharing data becomes a problem which
 DSM Algorithm has to be easily tackled in software and not in hardware as in
 Implementation level of DSM Mechanism multiprocessor systems. The memory organization of a
 Semantics for concurrent access software DSM system determines the way shared virtual
 Semantics (Replication/Partial/ Full/R/W) memory is organized on top of the isolated memory address
 Naming scheme has to be used to access remote data space. There are various advantages of programming
 Locations for replication (for optimization) distributed shared memory for multiprocessor environment
 System consistency model & Granularity of data as stated below:
 Data is replicated or cached  Sharing data becomes a problem which has to be
 Remote access by HW or SW tackled in the software and not in hardware as in
 Caching/replication controlled by HW or SW multiprocessor systems.
The value of distributed shared memory depends upon the  Shared memory programs are usually shorter and easier
performance of Memory Consistency Model. The to understand.
consistency model is responsible for managing the state of  Large or complex data structures may easily be
shared data for distributed shared memory systems. Lots of communicated.
consistency model defined by a wide variety of source  Programming with shared memory is a well-understood
including architecture system, application programmer etc. problem.
D. Memory Consistency Model  Shared memory gives transparent process-to-process
Although, shared-memory systems allow multiple communication.
processors to simultaneously read and write the same  Compact design and easy implementation and
memory locations and programmers require a conceptual expansion.
model for the semantics of memory operations to allow Software based DSM provide many advantages in design of
them to correctly use the shared memory. This model is multiprocessor systems. A distributed shared memory
generally referred to as a memory consistency model or mechanism allowing user’s multiple processors to access
memory model. So we can say that the memory consistency shared data efficiently. A DSM having no memory access
model for a shared-memory multiprocessor specifies the bottleneck and large virtual memory space can
behavior of memory with respect to read and write accommodate more no of processors. Its programs are
operations from multiple processors. According to the portable as they use common DSM programming interface,
system designer’s point of view, the model specifies but still having some disadvantages like programmers need
acceptable memory behaviors for the system. Thus, the to understand consistency models, to write correct programs.
memory consistency model influences many aspects of So, this study is very useful to build new shared memory
system design, including the design of programming system against designing constraints and other languages.
languages, compilers, and the underlying hardware. Software implementations of the DSM concept are usually
A memory model can be defined at any interface between built as a separate layer on the top of message passing
the programmer and the system whereas the system consists model.
of the base hardware and programmers express their According to the implementation level, several types of
programs in machine-level instructions. There are two type software-oriented DSM approaches can be recognized in
of interface: Figure 4.
 At the machine code interface, the memory model
specification affects the designer of the machine
hardware and the programmer who writes or reasons
about machine code.
 At the high level language interface, the specification
affects the programmers who use the high level
language and the designers of both the software that

© 2015-19, IJARCS All Rights Reserved 1071

Dhara Kumari et al, International Journal of Advanced Research in Computer Science, 8 (7), July-August 2017,1069-1073

A consistency model is used to express the semantics of

memory as observed by the programs sharing it.
Single Writer Protocol Traditionally, only computer architects designing
multiprocessor systems were interested in memory
consistency. However, the study of memory consistency
Multiple Writer
became increasingly popular in recent years and many
Protocol
DSM Approaches publications about theoretical aspects appear in the
literature. For example, Adve and Hill [8] and Sindhu et al.
Diff Creation [9] introduce formal specifications of consistency models.
Also, Raynal and Schiper [10] propose a set of formal
Synchronization definitions for several consistency models. The specification
Primitives of a consistency model provides answers to such questions
as: (1) what behavior is expected by the system (i.e., what is
the value returned by every read operation performed by a
Figure 4: software-oriented DSM approaches user)? (2) How does the system adhere to the expected
consistency of shared data? and (3) what are the constraints
During the implementation of software based DSM, these imposed on the ordering of shared data accesses performed
approaches provide many advantages in design of by two or more processors?
multiprocessor systems. As DSM system increase the
bandwidth and performance are the important criteria for  Review of software implementations
design. A DSM various implementation helps to design During the past decade, several prototypes have been built
various kinds of system using hardware and/or software that provide a DSM abstraction at the system level. System
approaches for multiprocessor environments as logically level implementations usually integrate the DSM as a region
shared, local physically distributed, paged-based, shared of virtual address space in the participating programs using
variable based and object based architectures. the virtual memory management system of the underlying
Problem of Software Based DSM operating system. Li’s shared virtual memory [11] provides
During the analysis of Research paper, we found two types users with an interface similar to the memory address space
of problem: on a multiprocessor architecture. Later, he expanded this
1. Poor application programming interfaces for idea to other architectures and developed a prototype called
solving complicated synchronization problems. Shiva on a hypercube [12].
2. Inefficiencies in multiple writer protocol. Munin [13] is a runtime system and a server mechanism to
allow programs written for shared memory multiprocessors
II OBJECTIVE to be executed efficiently in a distributed memory
The main objective of software DSM not only depends on environment. The runtime system handles faults, threads,
performance, but also on Persistence, Interoperability, and synchronization mechanisms and provides support for
Security, Resource Management, Scalability, and Fault multiple consistency protocols [14], while the server
Tolerance” that are presented in our proposed work for mechanism handles the correct mapping of shared segments
improving access control application in design of into local memories.
 Review of performance evaluation and analysis
multiprocessor systems. During the review of paper, we
identify two weaknesses of contemporary software Performance evaluation of distributed shared memory
distributed shared memory systems: poor application systems (both hardware and software approaches) is not an
programming interfaces for programmers who need to solve easy task. Keleher et al. [15] evaluate three implementations
complicated synchronization problems and inefficiencies in of software-based release consistent protocols.. Levelt et al.
[16] compare a language based DSM with a IVY–like
the traditional multiple writer protocols. in future work,
system level implementation. Sun and Zhu [17] propose
these two type of problem can be solve by using different
type of methodology and implementation that to provide a “generalized speedup” as a new performance metric. Also,
strong guarantee of Performance, Persistence, some work was done measuring the performance of parallel
Interoperability, Security, Resource Management, applications that run on distributed shared memory systems
Scalability, and Fault Tolerance during the read and write [18].
TreadMarks [19] provides a virtual shared address space
operation onto memory organization and multiprocessor
like IVY and Munin. TreadMarks implement lazy release
system.
consistency. A program with a data race condition might get
III LITERATURE SURVEY results which programmers do not expect. However, a
program without a data race condition runs as if in a
Literature Survey is vital part of the any research area for sequentially consistent memory model. Unlike Munin,
gathering a new type of information during the analysis that TreadMarks does not have different types of shared
produce new technology, application and other beneficial variables. All of shared memory follows lazy release
information. In this paper, it includes a review of different consistency. TreadMarks supports two synchronization
techniques and algorithms of Software Based Distributed primitives, locks and barriers.
Shared Memory System. IV CONCLUSIONS AND FUTURE WORK
 Review of consistency models In this paper according to analysis, we found that modern
software distributed shared memory systems have some

© 2015-19, IJARCS All Rights Reserved 1072

Dhara Kumari et al, International Journal of Advanced Research in Computer Science, 8 (7), July-August 2017,1069-1073

weaknesses that do not increase performance and efficiency [8] S. V. Adve and M. D. Hill. A Unified Formalization of Four
during the implementation of software based distributed Shared-Memory Models. IEEE Trans. on Paralleland
shared memory system. These weaknesses are: Distributed Systems, 4(6):613–624, June 1993.
[9] P. S. Sindhu, J-M. Frailong, and M. Cekleov. Formal
Specification of Memory Models. In M. Dubois and S. S.
 There are no high level synchronization primitives Thakkar, editors, Scalable Shared Memory Multiprocessors,
provided. In this case, Programmers have to use pages 25–41. Kluwer Academic Publishers, 1992.
basic synchronization primitives for example, [10] M. Raynal and A. Schiper. A Suite of Formal Definitions for
barriers and locks, to solve synchronization Consistency Criteria in Shared Memories. In Proc. of the 9th
problems. Int’l Conf. on Parallel and Distributed Computing Systems
 If many writers write to the page and read the page (PDCS’96), pages 125–131, September 1996.
then current multiple writer protocols suffer from [11] K. Li and P. Hudak. Memory coherence in shared virtual
the high cost of making a stale page current. memory systems. ACM Transactions on Computer Systems,
7(4):321– 359, November 1989.
Thus in future work, these two type of weaknesses can be
[12] K. Li and R. Schaefer. Shiva: An Operating System
solve by using different type of methodology and Transforming A Hypercube into a Shared-Memory Machine.
implementation that to provide a strong guarantee of Technical Report CS-TR-217-89, Dept. of Computer
Performance, Persistence, Interoperability, Security, Science,Princeton University, April 1989.
Resource Management, Scalability, and Fault Tolerance [13] J. B. Carter, J. K. Bennett, and W. Zwaenepoel. Techniques
during the read and write operation onto memory for reducing consistency-related communication in
organization and multiprocessor system. distributed shared memory systems. ACM Transactions on
Computer Systems, 13(3):205–243, August 1995.
V REFERENCES [14] J. B. Carter. Design of the Munin distributed shared memory
system. Journal of Parallel and Distributed Computing on
Distributed Shared Memory, 1995.
[1] Stanek, William R. (2009). Windows Server 2008 Inside
[15] P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel.
Out. O'Reilly Media, Inc. p. 1520. ISBN 978-07356-3806-8.
An Evaluation of Software-Based Release Consistent
Retrieved 2012-08-20. [...] Windows Server Enterprise
Protocols. Journal of Parallel and Distributed Computing,
supports clustering with up to eight-node clusters and very
29(2):126–141, September 1995.
large memory (VLM) configurations of up to 32 GB on 32-
[16] W. G. Levelt, M. F. Kaashoek, H. E. Bal, and A. S.
bit systems and 2 TB on 64-bit systems.
Tanenbaum. A Comparison of Two Paradigms for
[2] H. Amano, Parallel Computer. Shoukoudou, June 1996
Distributed Shared Memory. Software—Practice and
[3] N. Suzuki, S. Shimizu, and N. Yamanouchi, An
Experience, 22(11):985–1010, November 1992. Also
Implemantation of a Shared Memory Multiprocessor.
available as Free University of the Netherlands, Computer
Koronasha, Mar. 1993.
Science Department technical report IR-221.
[4] M. J. Flynn, Computer Architecture: Pipelined and Parallel
[17] X-H. Sun and J. Zhu. Performance Considerations of Shared
Processor Design, Jones and Barlett, Boston, 1995.
Virtual Memory Machines. IEEE Trans. OnParallel and
[5] Kai Li, “Shared Virtual Memory on Loosely Coupled
Distributed Systems, 6(11):1185–1194, November 1995.
Microprocessors” PhD Thesis, Yale University, September
[18] R. G. Minnich and D. V. Pryor. A Radiative Heat Transfer
1986.
Simulation on a SPARCStation Farm. In Proc. of the First
[6] Song Li, Yu Lin, and Michael Walker, “Region-based
IEEE Int’l Symp. on High Performance Distributed
Software Distributed Shared Memory,” CS 656 Operating
Computing (HPDC-1), pages 124–132, September 1992.
Systems, May 5, 2000.
[19] P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel.
[7] Ajay Kshemkalyani and Mukesh Singhal, Ch12: Distributed
TreadMarks: Distributed shared memory on standard
Computing: Principles, Algorithms, and Systems, Cambridge
workstations and operating systems. In the 1994 Winter
University Press, CUP 2008.
USENIX Conference, 1994.

© 2015-19, IJARCS All Rights Reserved 1073

Pipelining in computer architecture is a technique that improves processor performance by
overlapping the execution of multiple instructions . It breaks down instruction processing into smaller
stages, like an assembly line, allowing different stages to work on different instructions
simultaneously. This parallelism significantly increases the rate at which instructions are executed,
boosting overall throughput.

Here's a breakdown of how pipelining works:

Breaking down instructions:
An instruction is divided into multiple stages, such as instruction fetch (IF), instruction decode (ID), execute
(EX), memory access (MEM), and write back (WB).

Overlapping execution:
Instead of waiting for one instruction to complete all stages before starting the next, each stage works on a
different instruction concurrently.

Analogy:
Imagine an assembly line in a factory. Each station (stage) performs a specific task on a product (instruction),
and multiple products are processed simultaneously as they move down the line.

Benefits of Pipelining:
Increased throughput:
Pipelining significantly increases the number of instructions completed per unit of time, leading to faster
execution of programs.

Improved resource utilization:

By overlapping instruction execution, pipelining makes better use of processor resources, reducing idle
time.

Challenges of Pipelining (Hazards):

Structural hazards:
Occur when multiple instructions need the same resource simultaneously (e.g., two instructions trying to access
memory at the same time).

Data hazards:
Occur when an instruction depends on the result of a previous instruction that is still in the pipeline (e.g., an
instruction needs to read a value that hasn't been written back yet).

Control hazards:
Occur when a branch instruction changes the normal flow of instructions, causing the pipeline to potentially
execute the wrong instructions.

To mitigate these hazards, techniques like:

Forwarding: Sending results directly from one stage to another, bypassing the register file.

Stalling: Temporarily pausing the pipeline to resolve dependencies.

Branch prediction: Predicting the outcome of branch instructions to avoid unnecessary stalls.

Pipelining is a crucial concept in modern processor design, enabling significant performance gains
by allowing parallel execution of instructions.
What is Pipelining?
Pipelining is the process of accumulating instruction from the processor through a pipeline. It
allows storing and executing instructions in an orderly process. It is also known as pipeline
processing.

Before moving forward with pipelining, check these topics out to understand the concept better :

Memory Organization
Memory Mapping and Virtual Memory
Parallel Processing

Pipelining is a technique where multiple instructions are overlapped during execution. Pipeline is
divided into stages and these stages are connected with one another to form a pipe like structure.
Instructions enter from one end and exit from another end.

Pipelining increases the overall instruction throughput.

In pipeline system, each segment consists of an input register followed by a combinational circuit.
The register is used to hold data and combinational circuit performs operations on it. The output
of combinational circuit is applied to the input register of the next segment.

Pipeline system is like the modern day assembly line setup in factories. For example in a car
manufacturing industry, huge assembly lines are setup and at each point, there are robotic arms
to perform a certain task, and then the car moves on ahead to the next arm.

Types of Pipeline
It is divided into 2 categories:

1. Arithmetic Pipeline

2. Instruction Pipeline
Arithmetic Pipeline

Arithmetic pipelines are usually found in most of the computers. They are used for floating point
operations, multiplication of fixed point numbers etc. For example: The input to the Floating Point
Adder pipeline is:

X = A*2^a

Y = B*2^b

Here A and B are mantissas (significant digit of floating point numbers), while a and b are
exponents.

The floating point addition and subtraction is done in 4 parts:

1. Compare the exponents.

2. Align the mantissas.

3. Add or subtract mantissas

4. Produce the result.

Registers are used for storing the intermediate results between the above operations.

Instruction Pipeline

In this a stream of instructions can be executed by overlapping fetch, decode and execute phases
of an instruction cycle. This type of technique is used to increase the throughput of the computer
system.

An instruction pipeline reads instruction from the memory while previous instructions are being
executed in other segments of the pipeline. Thus we can execute multiple instructions
simultaneously. The pipeline will be more efficient if the instruction cycle is divided into segments
of equal duration.

Pipeline Conflicts
There are some factors that cause the pipeline to deviate its normal performance. Some of these
factors are given below:

1. Timing Variations
All stages cannot take same amount of time. This problem generally occurs in instruction
processing where different instructions have different operand requirements and thus different
processing time.

2. Data Hazards

When several instructions are in partial execution, and if they reference same data then the
problem arises. We must ensure that next instruction does not attempt to access data before the
current instruction, because this will lead to incorrect results.

3. Branching

In order to fetch and execute the next instruction, we must know what that instruction is. If the
present instruction is a conditional branch, and its result will lead us to the next instruction, then
the next instruction may not be known until the current one is processed.

4. Interrupts

Interrupts set unwanted instruction into the instruction stream. Interrupts effect the execution of
instruction.

5. Data Dependency

It arises when an instruction depends upon the result of a previous instruction but this result is
not yet available.

Advantages of Pipelining
1. The cycle time of the processor is reduced.

2. It increases the throughput of the system

3. It makes the system reliable.

Disadvantages of Pipelining
1. The design of pipelined processor is complex and costly to manufacture.

2. The instruction latency is more.

Vector processing, in computer architecture, refers to a technique where a
single instruction operates on multiple data elements simultaneously (Single
Instruction, Multiple Data or SIMD). This is in contrast to scalar processing,
where instructions operate on single data elements. Vector processing is
particularly useful for data-intensive applications like scientific computing,
image processing, and machine learning, where the same operation is applied
to a large collection of data points.

Here's a more detailed explanation:

Key Concepts:
Vectors:
In vector processing, data is organized into vectors, which are essentially one-
dimensional arrays of data items.

SIMD Instructions:
Vector processors use special instructions (SIMD instructions) that can operate on
multiple data elements in parallel.
Parallelism:
The core idea is to exploit parallelism by performing the same operation on multiple
data points at the same time, which significantly speeds up computation.
Vector Registers:
Vector processors typically have specialized registers (vector registers) that can hold
entire vectors, allowing for efficient parallel operations.

How it Works:
1. Data Organization: Data is loaded into vector registers.
2. Instruction Execution: A single instruction (e.g., add, multiply) is executed on all
elements of the vector registers simultaneously.
3. Results Storage: The results of the operations are stored back into vector registers
or memory.

Benefits of Vector Processing:

Performance Improvement:
Vector processing can lead to significant performance gains for algorithms that can be
vectorized (where the same operation is applied to multiple data elements).
Efficiency:
By reducing the number of instructions needed to process a large dataset, vector
processing can improve efficiency.
Hardware Optimization:
Modern processors often include specialized hardware units optimized for vector
operations, further enhancing performance.

Examples:
Scientific Computing:
Solving large systems of equations, simulating physical phenomena, and performing
complex calculations.
Image Processing:
Applying filters, transformations, and other operations to images represented as
matrices or arrays of pixels.
Machine Learning:
Training deep learning models, where operations are often performed on large matrices
and tensors.
Vector processor classification
Last Updated : 11 Jul, 2025

Vector processors have rightfully come into prominence when it comes to designing computing
architecture by virtue of how they handle large datasets efficiently. A large portion of this
efficiency is due to the retrieval from architectural configurations used in the implementation.
Vector processors are classified into two primary architectures: memory to memory and
register to register. These classification are important to optimize performance on the scientific
computing and other data intensive applications.

What is Vector Processor Classification?

Vector processor classification is the labeling of processors based on the amount of vector
operations it could be handling at a time. Here are the main classifications of vector
processors:

Memory to Memory Architecture

In memory-to-memory architecture, source operands, intermediate results, and final results are
retrieved directly from the main memory. For memory-to-memory vector instructions, it is
necessary to specify the base address, offset, increment, and vector length to facilitate data
transfers between the main memory and pipelines. Notable processors employing memory-to-
memory formats include TI-ASC, CDC STAR-100, and Cyber-205.

Main points about Memory-to-Memory Architecture

No limitation on size.
Speed is comparatively slower.

Register to Register Architecture

In register-to-register architecture, operands and results are retrieved indirectly from the main
memory through the use of a large number of vector or scalar registers. Processors like Cray-1
and Fujitsu VP-200 utilize vector instructions in register-to-register formats.

Main points about Register-to-Register Architecture

Limited size.
Higher speed compared to memory-to-memory architecture.
Increased hardware cost.

Hybrid Architecture
Hybrid Architecture unites memory-to-memory and register-to-register architectures to gain
from all. The way is that of flexible operand retrieval methods; which improves performance
and efficiency in many computational tasks. This solution provides a balanced solution which
can be adapted to requirements of the application, with a better utilization of available
resources.

A block diagram of a modern multiple pipeline vector computer is shown below:

Advantages of Vector Processor

Parallelism and SIMD Execution: Vector processors are intended to perform Single
Guidance, Different Information (SIMD) activities. This implies that a solitary guidance can
work on numerous information components in equal, considering a huge speedup in
errands that include monotonous procedure on enormous datasets. This parallelism is
appropriate for logical recreations, reenactments of actual frameworks, signal handling, and
different applications including weighty mathematical calculations.
Proficient Information Development: Vector processors are enhanced for moving
information productively among memory and the computer chip registers. This is pivotal for
execution, as memory access is much of the time a bottleneck in many registering errands.
Vector processors commonly incorporate specific information development directions that
can move information in huge lumps, limiting the effect of memory dormancy.
Diminished Guidance Above: With SIMD tasks, a solitary guidance can play out
similar procedure on various information components. This lessens the above related with
getting, unraveling, and executing individual directions, which is especially helpful for
errands that include tedious computations.
Energy Proficiency: By executing procedure on various information components at the
same time, vector processors can accomplish higher computational throughput while
consuming generally less power contrasted with scalar processors playing out similar
activities consecutively. This energy effectiveness is significant for superior execution
registering (HPC) applications where power utilization is a worry.
Logical and Designing Applications: Vector processors succeed in logical and
designing reproductions where complex numerical calculations are performed on huge
datasets. Applications like weather conditions displaying, computational liquid elements,
atomic elements reproductions, and picture handling can benefit altogether from the
computational power and parallelism presented by vector processors.
Superior Execution for Cluster Activities: Some true applications include
controlling exhibits or grids of information, for example, in information examination, AI, and
illustrations handling. Vector processors can perform activities like expansion, increase, and
other component wise procedure on whole clusters with a solitary guidance, significantly
speeding up these undertakings.
Compiler Streamlining: Vector processors frequently expect code to be written with a
certain goal in mind to make the most of their capacities. Compiler improvements can
naturally change undeniable level code into vectorized guidelines, permitting software
engineers to zero in on the algorithmic plan as opposed to low-even out advancements.
Memory Transmission capacity Usage: Vector processors can productively use
memory transfer speed by streaming information from memory into vector registers and
afterward performing calculations on those vectors. This can assist with alleviating the
exhibition effect of memory bottlenecks.
Vector Length Adaptability: Some vector processors permit adaptability in the length
of vector tasks, empowering software engineers to pick the proper vector length in light of
the application's necessities and the accessible equipment assets.

Disadvantages of Vector Processors

Complexity of Programming: Development of such codes is complex because
optimizing code for vector processors generally requires specialized knowledge.
Cost: Vector processors are expensive, because they require special hardware.
Limited Applicability: Not all applications are good candidates for vector processing;
there are tasks that are inherently sequential and cannot profitably be executed in parallel
Array processing in computer architecture refers to the ability of a system to perform computations on large arrays of data in
parallel, often using multiple processing elements . This approach significantly speeds up processing for applications that
involve repetitive calculations on large datasets, such as scientific simulations, image processing, and signal analysis.

Key Concepts:
Parallel Processing:
Array processors leverage parallelism by distributing computations across multiple processing elements (PEs), enabling simultaneous
execution of operations on different data elements.

SIMD Architecture:
A common approach is Single Instruction, Multiple Data (SIMD), where a single instruction is executed on multiple data elements
concurrently.
Processing Elements (PEs):
These are individual processing units within the array processor, each capable of performing computations on a portion of the data.
Control Unit (CU):
The CU coordinates the activities of the PEs, issuing instructions and managing data flow.
Types of Array Processors:

Attached Array Processors: These are auxiliary processors connected to a general-purpose host computer, often used to accelerate
specific computational tasks.
SIMD Array Processors: These processors have a more tightly integrated structure, with identical processing elements operating
synchronously under the control of a single CU.

How it works:
1. Data Decomposition:
The input data array is divided into smaller chunks, with each chunk assigned to a separate PE.
2. Instruction Issuance:
The control unit issues a single instruction that is then broadcast to all active PEs.
3. Parallel Execution:
Each PE independently executes the instruction on its assigned data chunk.
4. Result Gathering:
The results from each PE are then gathered and combined to produce the final output.

Benefits:
Increased Processing Speed:
Parallel execution significantly reduces the overall processing time for computationally intensive tasks.
Improved Efficiency:
Array processors are designed to optimize for repetitive calculations, making them faster than general-purpose processors for these
workloads.
Enhanced Performance:
By leveraging parallelism, array processors can handle large datasets more efficiently, leading to improved system performance.

Examples:
Scientific Simulations:
Simulating weather patterns, molecular dynamics, or financial markets often involves extensive matrix operations, which can be
accelerated by array processors.
Image and Signal Processing:
Applications like medical imaging, speech recognition, and seismic data analysis rely on array processing for tasks such as filtering,
transformations, and feature extraction.
Graphics Processing:
Vector processors, a type of array processor, are crucial for handling the parallel computations required for rendering graphics in video
games and other applications.
Types of Array Processor
Last Updated : 03 May, 2024

Array Processor performs computations on large array of data. These are two types of Array Processors: Attached Array
Processor, and SIMD Array Processor. These are explained as following below.

1. Attached Array Processor :

To improve the performance of the host computer in numerical computational tasks auxiliary processor is attached to it.

Attached array processor has two interfaces:

1. Input output interface to a common processor.

2. Interface with a local memory.

Here local memory interconnects main memory. Host computer is general purpose computer. Attached processor is back
end machine driven by the host computer.

The array processor is connected through an I/O controller to the computer & the computer treats it as an external
interface.

Working:

Let us assume that we are executing vast number of instructions, in that case it is not possible to execute all instructions
with the help of host computer. Sometimes it may take days of weeks to execute these vast number of introductions. So
in order to enhance the speed and performance of the host computer as shown in above diagram.

I/o interface is used to connect and resolve the difference between host and attached process. Attached array processor
is normally used to enhance the performance of the host computer . Array processor mean bunch/group of process used
together to perform any computation.

2. SIMD array processor :

This is computer with multiple process unit operating in parallel Both types of array processors, manipulate vectors but
their internal organization is different.
SIMD is a computer with multiple processing units operating in parallel.

The processing units are synchronized to perform the same operation under the control of a common control unit. Thus
providing a single instruction stream, multiple data stream (SIMD) organization. As shown in figure, SIMD contains a set
of identical processing elements (PES) each having a local memory M.

Working:

Here array processor is in built into the computer unlike in attached array processor where array processor is attached
externally to the host computer. Initially mark control unit decodes the instruction and generate the control signals and
passes it into all the processor elements(PE's) or ALU upon receiving the control signals from master control unit, all the
processing elements come to throw that operations need to perform. The data perform the operations will be accessed
from main memory into the local memory of respective PE's. SIMD array processor is normally used to compute vector
data. All PE's execute same instruction simultaneously on different data.

for ex: Ci= Ai + Bi

Here the vector instruction Ci= Ai+ Bi need to be executed is addition operation, the master control unit generate control
signal and passes it onto all processing elements and data.

parallelly on same instruction on with different data, that's why it is called as Single instruction multiple data
streams(SIMD) array processor.

Each PE includes -

ALU
Floating point arithmetic unit
Working registers

Master control unit controls the operation in the PEs. The function of master control unit is to decode the instruction and
determine how the instruction to be executed. If the instruction is scalar or program control instruction then it is directly
executed within the master control unit.

Main memory is used for storage of the program while each PE uses operands stored in its local memory.
RISC and CISC in Computer Organization
Last Updated : 11 Jul, 2025

RISC is the way to make hardware simpler whereas CISC is the single instruction that handles multiple work. In this
article, we are going to discuss RISC and CISC in detail as well as the Difference between RISC and CISC, Let's
proceed with RISC first.

Reduced Instruction Set Architecture (RISC)

The main idea behind this is to simplify hardware by using an instruction set composed of a few basic steps for
loading, evaluating, and storing operations just like a load command will load data, a store command will store the
data.

Characteristics of RISC

Simpler instruction, hence simple instruction decoding.

Instruction comes undersize of one word.
Instruction takes a single clock cycle to get executed.
More general-purpose registers.
Simple Addressing Modes.
Fewer Data types.
A pipeline can be achieved.

Advantages of RISC

Simpler instructions: RISC processors use a smaller set of simple instructions, which makes them easier to
decode and execute quickly. This results in faster processing times.
Faster execution: Because RISC processors have a simpler instruction set, they can execute instructions faster
than CISC processors.
Lower power consumption: RISC processors consume less power than CISC processors, making them ideal
for portable devices.

Disadvantages of RISC

More instructions required: RISC processors require more instructions to perform complex tasks than CISC
processors.
Increased memory usage: RISC processors require more memory to store the additional instructions needed
to perform complex tasks.
Higher cost: Developing and manufacturing RISC processors can be more expensive than CISC processors.

Complex Instruction Set Architecture (CISC)

The main idea is that a single instruction will do all loading, evaluating, and storing operations just like a multiplication
command will do stuff like loading data, evaluating, and storing it, hence it's complex.

Characteristics of CISC

Complex instruction, hence complex instruction decoding.

Instructions are larger than one-word size.
Instruction may take more than a single clock cycle to get executed.
Less number of general-purpose registers as operations get performed in memory itself.
Complex Addressing Modes.
More Data types.

Advantages of CISC

Reduced code size: CISC processors use complex instructions that can perform multiple operations, reducing
the amount of code needed to perform a task.
More memory efficient: Because CISC instructions are more complex, they require fewer instructions to
perform complex tasks, which can result in more memory-efficient code.
Widely used: CISC processors have been in use for a longer time than RISC processors, so they have a larger
user base and more available software.

Disadvantages of CISC

Slower execution: CISC processors take longer to execute instructions because they have more complex
instructions and need more time to decode them.
More complex design: CISC processors have more complex instruction sets, which makes them more difficult
to design and manufacture.
Higher power consumption: CISC processors consume more power than RISC processors because of their
more complex instruction sets.

CPU Performance
Both approaches try to increase the CPU performance

RISC: Reduce the cycles per instruction at the cost of the number of instructions per program.

CPU Time

CISC: The CISC approach attempts to minimize the number of instructions per program but at the cost of an
increase in the number of cycles per instruction.
Earlier when programming was done using assembly language, a need was felt to make instruction do more tasks
because programming in assembly was tedious and error-prone due to which CISC architecture evolved but with the
uprise of high-level language dependency on assembly reduced RISC architecture prevailed.

Example:

Suppose we have to add two 8-bit numbers:

CISC approach: There will be a single command or instruction for this like ADD which will perform the task.
RISC approach: Here programmer will write the first load command to load data in registers then it will use a
suitable operator and then it will store the result in the desired location.

So, add operation is divided into parts i.e. load, operate, store due to which RISC programs are longer and require
more memory to get stored but require fewer transistors due to less complex command.

R ISC vs CISC

RISC CISC

Focus on software Focus on hardware

Uses only Hardwired control unit Uses both hardwired and microprogrammed control unit

Transistors are used for storing complex

Transistors are used for more registers
Instructions

Fixed sized instructions Variable sized instructions

Can perform only Register to Register Arithmetic Can perform REG to REG or REG to MEM or MEM to
operations MEM

Requires more number of registers Requires less number of registers

Code size is large Code size is small

An instruction executed in a single clock cycle Instruction takes more than one clock cycle

An instruction fit in one word. Instructions are larger than the size of one word

Simple and limited addressing modes. Complex and more addressing modes.

RISC is Reduced Instruction Cycle. CISC is Complex Instruction Cycle.

The number of instructions are less as compared to CISC. The number of instructions are more as compared to RISC.

It consumes the low power. It consumes more/high power.

RISC is highly pipelined. CISC is less pipelined.

RISC required more RAM . CISC required less RAM.

Here, Addressing modes are less. Here, Addressing modes are more.
Multicore processors, with multiple processing units on a single chip , are fundamental to modern
computing, offered by both Intel and AMD. These processors enhance performance and multitasking
capabilities by allowing concurrent execution of instructions. Both companies offer a range of multicore
processors for various applications, from personal computers to servers, with different strengths and
specializations.

Intel:
Core i Series:
Intel's Core i3, i5, i7, and i9 processors are widely known for their single-core performance and broad software
compatibility.

Strengths:
Intel processors are often favored for tasks requiring high single-core speed, making them a reliable choice for
general computing and many applications.

Multicore Architecture:
Intel's multicore processors feature multiple execution pipelines and resources within each core, enabling
simultaneous task execution.

AMD:
Ryzen Series:
AMD's Ryzen 3, 5, 7, and 9 processors, along with Threadripper, are known for their strong multi-core performance
and integrated graphics capabilities.

Strengths:
AMD processors are often lauded for their price-to-performance ratio, offering more cores and cache for the same
or lower cost than comparable Intel processors.

Multicore Advantages:
AMD processors are well-suited for tasks involving heavy workloads, such as content creation, video editing, and
gaming, where parallel processing is crucial.

Key Differences:
Performance Focus:
Intel often emphasizes single-core speed, while AMD is known for its multicore performance and competitive
pricing.

Integrated Graphics:
AMD's integrated graphics are often more powerful than Intel's, potentially saving costs on discrete graphics
cards.

Cache:
AMD processors may have more L3 cache, while Intel processors can have more L2 cache.

In essence:
Both Intel and AMD offer excellent multicore processors, but their strengths lie in different areas.

Intel excels in single-core performance and general compatibility.

AMD offers more cores, better value, and stronger integrated graphics.
Choosing between the two depends on the specific needs and priorities of the user.
Difference Between Intel and AMD

Intel AMD

Less expensive than AMD

Less expensive than Intel at a higher
Processor at the lower
range.
range.

Less efficient than AMD. More efficient than Intel.

Is generally cooler due to smaller

Can heat up when used with
lithography(TSMC 7nm is similar to
Clock Speed Boost(14 nm)
Intel 10 nm)

IPC (Rocket Lake) is lower IPC(Zen 3) is higher than Intel (Rocket

than AMD (Zen 3) Lake)

Clock speed reaches and The clock speed can reach 5.0 GHz but
surpassed 5.0 GHz results in more heat

iGPU is present only in AMD APU

iGPU present in almost all
series with higher GPU performance
Core i series CPU(except
compared to Intel iGPU(HD Graphics
Core i F-series)
and)

It has symmetric It has symmetric multiprocessing

multiprocessing capabilities capabilities of up to 8 sockets/128
of up to 4 sockets/28 cores. cores.