Introduction to Multiprocessor Systems:
Multiprocessor systems are designed to solve complex problems by
utilizing multiple processors that work cooperatively. This section
discusses functional structures of several multiprocessor systems
such as the Cm, Cyber-170, Honeywell 60/66, and PDP-10
multiprocessor configurations.
Key Characteristics of Multiprocessors:
1.Multiple Processors in One Computer:
○ A multiprocessor system integrates several processors
within a single computing system.
○ Each processor may be capable of performing different
tasks or working together on the same task, enhancing
processing power and efficiency.
2.Communication and Cooperation:
○ Processors communicate to coordinate their actions. This
communication can happen through two primary methods:
■ Message Passing: Processors send messages to one
another to share data or instructions.
■ Shared Memory: Multiple processors access and share
common memory, allowing them to work cooperatively
by reading/writing data stored in a common memory
space.
Multiprocessor vs. Multicomputer Systems:
While both multiprocessor and multicomputer systems aim to support
concurrent operations, they differ in the extent of resource sharing
and cooperation:
1.Multiprocessor Systems:
○ A multiprocessor system features multiple processors
controlled by a single operating system.
○ The operating system facilitates the interaction between
processors at different levels (process, data set, and
data element levels).
2.Multicomputer Systems:
○ A multicomputer system consists of multiple autonomous
computers. These computers may or may not communicate
with each other.
○ Each computer typically has its own operating system, and
the communication between computers is often less
integrated than in a multiprocessor system.
○ Example: IBM Attached Support Processor System.
Examples of Multiprocessor Systems:
● Denelcor's HEP system: This is an example of a multiprocessor
system controlled by a single operating system that
facilitates cooperation among processors.
Architectural Models of Multiprocessors:
There are two key types of multiprocessor architectures:
1.Tightly Coupled Multiprocessors:
○ In tightly coupled multiprocessors, processors
communicate through a shared main memory.
○ The data transfer rate between processors is limited by
the memory bandwidth.
○ Each processor may have a local memory or cache to speed
up data access.
○ Connectivity: There is complete connectivity between
processors and memory, achieved via:
■ Interconnection Network: Links the processors and
memory.
■ Multi-Ported Memory: Memory with multiple ports to
allow simultaneous access by processors.
○ Limiting Factors:
■ Memory Contention: When multiple processors attempt
to access the same memory location, performance
degrades.
■ Processor-Memory Interconnection Network: The design
and capacity of the network may limit scalability.
2.Loosely Coupled Multiprocessors:
○ Loosely coupled systems are typically composed of
autonomous processors, each with its own memory.
○ These systems may communicate through message passing
rather than shared memory.
○ They are more independent than tightly coupled systems,
allowing for distributed processing.
Performance Considerations and Challenges:
1.Memory Contention:
○ As multiple processors access the same memory, contention
occurs when two or more processors attempt to read or
write to the same memory location simultaneously.
○ This can lead to significant performance degradation.
○ Solution: Interleaving memory in a manner that reduces
contention. This can be achieved by distributing memory
access across different memory modules.
2.Processor-Memory Interconnection Network:
○ The design and capacity of the interconnection network
between processors and memory are critical.
○ A high-speed interconnection network is needed to prevent
bottlenecks that can occur when processors need to
frequently access shared memory.
Interleaved Main Memory for Multiprocessors:
● Interleaving: This technique involves distributing memory
addresses across multiple memory units (modules) to reduce
conflicts.
● Degree of Interleaving: The greater the interleaving, the
fewer the chances of multiple processors contending for the
same memory module at the same time.
● Memory Module Assignment: Careful data assignments to memory
modules can also help reduce conflicts and improve
performance.
Summary of Functional Characteristics:
● Multiprocessor systems use multiple processors working in
tandem to solve a problem.
● These systems can be tightly or loosely coupled depending on
the degree of resource sharing.
● Tightly coupled systems benefit from shared memory, but face
challenges with contention and network bottlenecks.
● Loosely coupled systems tend to be more independent,
communicating through message passing and having their own
local memories.
Detailed Notes on Loosely Coupled Multiprocessors (LCS)
1. Introduction to Loosely Coupled Systems:
● Loosely Coupled Multiprocessors (LCS):
○ In LCS, each processor is paired with its local memory
and input-output devices.
○ Each processor works independently and has a large local
memory where it stores data and instructions for
execution.
○ Communication between processors occurs through
message-passing, not through shared memory like in
tightly coupled systems.
○ Because processors do not share memory directly, they are
considered part of a distributed system.
○ Coupling Degree: The communication in LCS is loose, as
compared to tightly coupled systems (TCS), where
processors communicate more directly.
2. Structure of a Loosely Coupled Multiprocessor:
● Computer Module:
○ A computer module consists of:
■ Processor
■ Local memory
■ Local I/O devices
■ Interface to other modules
● Message-Transfer System (MTS):
○ Processes running on different computer modules
communicate by exchanging messages.
○ The message-transfer system (MTS) is responsible for the
communication between modules.
○ Interface Components:
■ A Channel and Arbiter Switch (CAS) handles the
arbitration when two or more modules request the MTS
at the same time.
■ Arbiter chooses which request to prioritize based on
a service discipline, delaying the others until the
selected one is completed.
3. Communication System in Loosely Coupled Multiprocessors:
● Message-Transfer System (MTS) Structure:
○ Time-Shared Bus: A simple MTS could be a time-shared bus
(e.g., in PDP-11), where processors send messages in a
shared environment.
○ Shared Memory System: More advanced MTS could involve a
shared memory system, where memory is distributed across
modules or central memory with multiple ports.
● Interconnection Network:
○ The interconnection network connects processors and
memory, ensuring that messages are transferred between
processors.
○ A multiported memory system uses distributed arbitration
logic to control access to memory, avoiding memory
conflicts.
4. Performance Considerations:
● Message Transfer Performance:
○ The performance of the MTS is critical in LCS. If a
single time-shared bus is used, the performance can be
affected by:
■ Message arrival rate: How fast messages are arriving
at the bus.
■ Message length: The size of each message being
transferred.
■ Bus capacity: The speed of the bus in transferring
data.
○ Bus contention: As more computer modules are added,
contention (conflicts) for access to the bus increases,
slowing down performance.
● Shared Memory in MTS:
○ If a shared memory system is used in the MTS, memory
conflicts can arise when multiple processors attempt to
access the same memory locations at the same time.
○ This can be resolved using processor-memory
interconnection networks that manage memory access, but
still, it’s a limiting factor for system performance.
5. Communication Between Tasks:
● Local Memory Communication:
○ Tasks that are running on the same processor communicate
using the local memory of that processor.
● Inter-Processor Communication:
○ Tasks on different processors communicate through
communication ports located in the communication memory.
○ Every processor has an input port for receiving messages
from other processors.
○ Communication between tasks on different processors
involves two steps:
1.Message to Input Port: The message is first sent to
the destination processor’s input port.
2.Transfer to Task: From the input port, the message
is transferred to the task's local memory.
6. Message-Transfer Process:
● The process of communication between tasks can be illustrated
as:
○ Intra-Processor Communication (Within same processor):
■ Task A sends a message to Task B within the same
processor through the local memory.
○ Inter-Processor Communication (Between different
processors):
■ Step 1: Task A (on Processor P1) sends a message to
the input port of Processor P2.
■ Step 2: The message is then transferred from the
input port of Processor P2 to the local memory of
the destination task.
7. Simplified Summary:
● Loosely Coupled Multiprocessors (LCS) are systems where
processors have their own local memory and communicate through
message-passing.
● The communication is managed by a message-transfer system
(MTS), which may be a simple time-shared bus or a more complex
shared memory system.
● The performance of the system is limited by the message
arrival rate, bus capacity, and memory conflicts.
● Tasks on the same processor communicate through local memory,
while tasks on different processors use a communication memory
with input ports for message transfer.
This structure allows loosely coupled systems to be more scalable
but comes with trade-offs in terms of performance and communication
efficiency compared to tightly coupled systems.
The Cm Architecture is a hierarchical multiprocessor system designed
by Carnegie Mellon University (CMU) that includes a local switch
(Slocal) to manage intercommunication between processors, memory,
and I/O devices. Here's a detailed breakdown of the Cm Architecture,
covering all key points.
1. Overview of Cm Architecture
● Modules & Switches: The architecture consists of multiple
computer modules (Cm), each with a local switch called Slocal.
● Function of Slocal: Slocal intercepts processor requests,
routing them to local memory and I/O devices. It also
processes requests from other computer modules to access its
memory and I/O resources.
● Address Translation: Address translation is carried out using
the X-bit in the LSI-11 processor status word (PSW) and the
four high-order bits of the processor's address. A mapping
table determines whether an address reference is local or
non-local.
2. Address Translation Process
● Mapping Table: The mapping table uses the processor's address
bits and PSW to determine the location of the memory address
(local or non-local).
● Non-local References: When a non-local reference occurs, a
virtual address is formed by concatenating the non-local
address field from the mapping table with the source
processor's identification.
● Map Bus: The map bus facilitates communication for address
translation, allowing modules to share a single Kmap processor
for address mapping.
3. Cluster and Hierarchical Structure
● Clusters: The system is divided into clusters, each containing
multiple computer modules, a Kmap, and the map bus.
● Inter-Cluster Communication: Clusters are interconnected using
inter-cluster buses. These buses allow communication between
clusters, but not all clusters need direct interconnection.
This reduces the complexity of the network.
● Cluster Communication: Communication within a cluster occurs
via the map bus, while inter-cluster communication happens via
the inter-cluster buses.
4. The Role of the Kmap Processor
● Purpose: The Kmap processor is responsible for address
mapping, communication, and synchronization in the Cm
architecture.
● Processor Components: The Kmap consists of three main
processors:
○ Kbus: Bus controller managing map bus requests.
○ Linc: Manages communication between Kmaps in different
clusters.
○ Pmap: Performs address translation and processing of
requests.
● Functionality: The Kmap handles multiple concurrent requests,
supporting up to 8 partitions (contexts) for concurrent
execution.
5. Pmap and Context Switching
● Context Management: The Pmap supports multiple contexts. Each
context represents a single transaction and can switch to
another context if it needs to wait for a message. This
multiprogramming feature allows the system to handle several
requests simultaneously.
● Request Processing: When a computer module (Cm) sends a
service request (for non-local memory access), the Kbus
allocates a Pmap context to process the request. The Pmap
translates the virtual address to a physical address and
initiates the memory access.
● Context Switching: After a memory access request is initiated,
the Pmap switches to another context, allowing other
transactions to proceed concurrently.
6. Intracluster Memory Access
● Process: Intracluster memory access involves several steps:
1.The processor of the master Cm sends a non-local memory
access request to the Kbus.
2.The Kbus allocates a context in the Pmap and performs
address translation.
3.The Pmap initiates memory access by sending the physical
address to the destination Cm via the map bus.
4.Once the destination Cm completes the access, it signals
a return request to the Kmap.
5.The Kmap retrieves the data and sends it back to the
master Cm.
6.The processor in the master Cm continues execution after
receiving the result.
7. Intercluster Communication
● Packet-Switched Buses: The inter-cluster buses are also
packet-switched. They allow communication between Kmaps in
different clusters, facilitating remote procedure calls.
● Linc Processor: Each Linc processor in the Kmap manages
communication between clusters, handling both incoming and
outgoing inter-cluster messages.
● Types of Intercluster Messages: There are two types of
inter-cluster messages:
○ Forward Messages: Used to invoke a new context at the
destination Kmap.
○ Return Messages: Sent back to reactivate a context at the
destination Kmap after the operation is complete.
8. Handling Multicluster Operations
● Multicluster Operations: When a processor sends a non-local
memory reference across clusters, the process involves
multiple Kmaps and contexts. The source Kmap (master Kmap)
triggers the context in its cluster, while the destination
Kmap (slave Kmap) activates its own context to process the
request.
● Forward and Return Messages: The master Kmap prepares a
forward message and sends it to the slave Kmap, which then
processes the request and sends a return message back to
reactivate the waiting context.
9. System Reliability and Performance
● Fault Tolerance: The system's hierarchical structure enhances
reliability, as failure in one inter-cluster bus can still
allow other clusters to operate.
● Performance: Memory reference times vary based on locality:
○ Local references: Approx. 3 µs.
○ Intracluster references: Approx. 9 µs.
○ Inter-cluster references: Approx. 26 µs.
● System Bottleneck: The map bus can become a bottleneck, as
only one transaction can take place at a time.
10. Scalability
● Extended Hierarchy: The hierarchical system can be extended to
an n-level hierarchy. For example, in a binary tree structure,
each node could represent a cluster, and the system could
scale to a larger number of clusters with multiple
communication layers.
11. Kmap Technology and Performance
● Speed vs Processor: The Kmap is much faster than the LSI-11
processor used in the Cm, which has a relatively low MIPS
rating (~0.1 MIPS). This mismatch may impact performance if
faster processors are used in the Cm, as the Kmap's speed
could become a limiting factor in large systems.
● Multiprocessing: The Kmap is designed to handle multiple
requests simultaneously (up to eight concurrent contexts),
making it well-suited for parallel algorithms with high
locality. However, if processes frequently cross inter-cluster
buses, performance may degrade.
12. Potential Limitations
● Network Complexity: As the system scales, the complexity of
the inter-cluster network may increase, requiring more
sophisticated routing and communication protocols.
● Inter-Cluster Overhead: If locality is poor and requests cross
inter-cluster buses often, the performance may degrade due to
higher latency in inter-cluster communication.
13. Key Points Summary
● The Cm architecture is a hierarchical multiprocessor system
designed to manage memory and I/O access efficiently.
● Slocal switches manage memory access within modules, and Kmap
processors handle address translation and inter-cluster
communication.
● The system uses context switching and packet-switched buses
for handling memory access requests and inter-cluster
communication.
● The architecture supports fault tolerance, scalability, and
parallel algorithms but may experience performance bottlenecks
if requests cross inter-cluster buses frequently.
This hierarchical multiprocessor system is well-suited for
environments requiring parallel processing and high levels of memory
access coordination. It balances speed, reliability, and scalability
while maintaining fault tolerance and optimizing system performance.
Tightly Coupled Multiprocessors (TCS) - Detailed Notes
Overview
Tightly Coupled Multiprocessors (TCS) are used in scenarios where
low response times and high-speed processing are essential. These
systems address the limitations of loosely coupled systems, where
the throughput might not meet the requirements of time-sensitive
applications.
Components of Tightly Coupled Systems
A typical TCS consists of:
1.Processors (p): These are the computational units.
2.Memory Modules (m): A set of memory modules that store data.
3.I/O Channels (d): Input/Output channels for communication with
peripheral devices.
4.Interconnection Networks: These networks link processors,
memory modules, and I/O channels.
The key interconnection networks in a TCS:
● Processor-Memory Interconnection Network (PMIN): A switch
system that connects processors to memory modules.
● I/O-Processor Interconnection Network (IOPIN): Facilitates
communication between processors and I/O channels.
● Interrupt-Signal Interconnection Network (ISIN): Enables
interrupt communication among processors, used for
synchronization or error detection.
Architecture Details
1.PMIN (Processor-Memory Interconnection Network):
○ Typically a crossbar switch with a complexity of O(p ×
(n+k)), where:
■ p = number of processors
■ n = width of the address within a memory module
■ k = width of the data path
○ Crossbar complexity can be high, especially for large p
and m values.
○ Alternatives like multi-stage networks may reduce the
complexity.
2.Memory Modules and Conflicts:
○ A memory module can only handle one processor request per
cycle, which could lead to conflicts if multiple
processors request the same memory.
○ Conflict resolution is managed by PMIN, potentially using
broadcasting for data distribution across multiple memory
modules.
○ Unmapped Local Memory (ULM): Each processor can have its
own local memory reserved for the kernel or operating
system, helping to reduce contention in the PMIN.
○ In systems like the C.mmp multiprocessor, the use of ULM
helps in saving state data for tasks and processes,
reducing PMIN traffic.
3.Cache and Performance Optimization:
○ To reduce traffic in the PMIN and lower the instruction
cycle time, private caches are employed for each
processor.
○ Caches help in reducing crossbar contention and memory
access conflicts.
○ Cache Coherence: The use of private caches introduces the
problem of inconsistent copies of data. Solutions to this
problem are discussed in the context of cache coherence
protocols.
4.Interrupt and Synchronization:
○ The ISIN network allows processors to send interrupts to
one another, facilitating synchronization and error
management.
○ Example: A failing processor can broadcast an alarm to
functioning processors.
5.I/O Communication:
○ The IOPIN allows processors to communicate with
peripheral devices through I/O channels.
○ The I/O system may involve multiple processors connected
to shared I/O devices.
Processor Types: Homogeneous vs. Heterogeneous
● Homogeneous Systems: All processors are functionally
identical. This makes programming simpler as there is no
connector problem.
○ Example: The IBM 3081K uses identical processors.
● Asymmetric Systems: Processors may differ in terms of
performance, I/O access, or reliability. This asymmetry can
complicate error recovery and load balancing, but may be
necessary for optimizing certain applications.
○ Example: The IBM 3084 AP and C.mmp use asymmetric
configurations.
I/O System Asymmetry
● Symmetric I/O Network: Every device is equally accessible from
every processor. This offers flexibility but at a high cost.
● Asymmetric I/O Network: Devices connected to one processor are
not accessible by others. This can reduce the cost but
increases the risk of failure if the processor or I/O device
fails.
○ In asymmetric I/O, if a CPU fails, the I/O devices
attached to it are no longer accessible by other
processors, which could cause data transfer delays or
failures.
Fault Tolerance and Redundancy
● Redundant I/O Connections: To overcome the failure problem in
asymmetric systems, some designs incorporate redundant I/O
paths. However, this requires additional arbitration logic and
careful reliability analysis.
● Redundant System Designs: Such systems may use multiple paths
for I/O access to ensure reliability, even if the primary CPU
fails. Examples include systems like the Honeywell 60/66.
Example Architectures of Tightly Coupled Multiprocessors
1.Cyber-170 Architecture:
○ Composed of two subsystems: Central Processing Subsystem
(CPS) and Peripheral Processing Subsystem (PPS).
○ Both subsystems share access to Central Memory (CM) via a
central memory controller acting as an intelligent switch
(combining PMIN, IOPIN, and ISIN).
○ The system has a two-level memory hierarchy with a
primary CM and secondary ECM (extended core memory).
2.Honeywell 60/66 Architecture:
○ The system has redundancy with multiple paths to memory
and I/O devices.
○ The system controller manages memory conflicts and routes
interrupts and communications among system components.
○ Redundancy increases availability and fault tolerance by
allowing all I/O devices to remain accessible even if one
component fails.
3.PDP-10 Architecture:
○ Asymmetric Master-Slave Configuration: One processor
(master) controls peripheral devices, while the other
processor (slave) cannot initiate peripheral operations.
○ Symmetric Configuration: Both processors are connected to
a set of shared peripherals, but each data channel is
linked to only one processor.
Cost and Trade-offs
● The key trade-off in tightly coupled multiprocessors lies
between performance and cost:
○ Symmetric systems offer greater flexibility and
availability, but are expensive.
○ Asymmetric systems are cheaper but risk lower
availability if a critical processor or I/O device fails.
Key Takeaways
1.Tightly Coupled Multiprocessors are best suited for
high-speed, real-time applications where response time is
critical.
2.Key architectural elements include private caches,
interconnection networks, and conflict resolution mechanisms
like local memory and redundant I/O paths.
3.Fault tolerance is a crucial consideration, requiring systems
to support redundancy and error recovery.
4.The complexity of the architecture increases with the number
of processors and memory modules, and the choice between
symmetric and asymmetric systems influences both performance
and cost.
Processor Characteristics for Multiprocessing
Multiprocessors are often constructed using processors that were not
originally designed for multiprocessing. Examples of such systems
include the C.mmp system (using DEC's PDP-11 processors) and the Cm
multiprocessor* (using LSI-11 microprocessors). These systems often
utilize off-the-shelf components to reduce development time.
However, using these components can introduce undesirable features
in the system. To build an effective processor for a multiprocessing
environment, the following architectural features are crucial:
1. Process Recoverability
● Key Concept: Processors in a multiprocessor system must be
able to recover processes if a processor fails.
● Design Requirement: When a processor fails, another processor
should be able to retrieve the interrupted process state and
continue execution.
● Challenge: Many processors store process state in internal
registers that aren't accessible externally, making recovery
difficult.
● Solution: A register file shared by all processors could help
in gracefully degrading operations.
● Benefit: This allows for better reliability since it’s easier
to transfer the process state from one processor to another.
● Implementation Consideration: Modern processors can separate
general-purpose registers from the processor itself with
minimal speed loss.
2. Efficient Context Switching
● Key Concept: Efficient context switching is essential for
multiprocessor systems to switch between different processes
rapidly.
● Context Switching Process: Context switching involves saving
the state of the current process and restoring the state of a
new process. This operation requires queueing and stack
operations to store and manage process states.
● Instruction Example: For efficient context switching,
processors can implement a special instruction such as the
Central Exchange Jump (CEJ) in the Cyber-170 processor. This
instruction saves the state of the current process and loads
the state of another process from a central memory area known
as the exchange package.
● Example: The S-1 multiprocessor system uses 16 register sets
per processor to facilitate quick task switching by changing
the current process register to point to a different register
set.
● Design Goal: To minimize the overhead of context switching,
stack instructions should be fast and efficient for saving and
restoring the processor's state.
3. Large Virtual and Physical Address Spaces
● Key Concept: A multiprocessor must support large physical and
virtual address spaces to handle large data sets and provide
flexibility in process execution.
● Physical Address Space: In multiprocessor systems, the
physical address space should be large enough to store a
significant amount of data.
● Virtual Address Space: A large virtual address space is
crucial for flexibility, and it should be segmented to ensure
modular sharing and memory protection for better reliability
and safety.
○ Example: The processor in the S-1 multiprocessor system
supports 2GB of virtual memory and 4 gigawords of
physical memory, with each word being 36 bits wide.
4. Effective Synchronization Primitives
● Key Concept: Synchronization primitives are crucial to ensure
mutual exclusion in a multiprocessor system, which means
ensuring that only one process can access shared data at a
time.
● Synchronization Mechanism: Mutual exclusion often involves
mechanisms such as semaphores and atomic read-modify-write
memory cycles.
○ Semaphore: A semaphore controls access to shared
resources by managing a queue of suspended processes
waiting for the semaphore's value to change.
○ Example Instructions: The test-and-set and
compare-and-swap instructions are commonly used to
implement synchronization primitives like semaphores.
● Goal: The processor must support hardware-level atomic
operations to efficiently manage mutual exclusion and process
synchronization.
5. Interprocessor Communication Mechanism
● Key Concept: In a multiprocessor system, processors need an
efficient way to communicate with each other to exchange data
and synchronize their actions.
● Hardware Mechanism: The interprocessor communication mechanism
should be hardware-based for efficiency. This mechanism helps
to synchronize processors and notify them of faults or status
changes in other processors.
● Polling Limitation: In a software-only communication system,
processors might have to poll memory locations for messages,
which could cause delays in communication, especially in large
systems.
○ Example of Hardware Mechanisms: Systems like the IBM
370/168 MP and Cray X-MP employ hardware interprocessor
communication to avoid the inefficiencies of polling.
○ Arbitration: In multiprocessors, multiple processors
might request the same interprocessor communication path
simultaneously. This requires arbitration to manage these
requests and prevent conflicts.
6. Instruction Set
● Key Concept: The processor's instruction set must be robust
enough to support efficient concurrency, particularly at the
procedure level, and facilitate the manipulation of complex
data structures.
● Required Features:
○ Procedure linkage (for function calls)
○ Looping constructs
○ Parameter manipulation
○ Multidimensional index computation
○ Range checking of addresses (for memory protection)
● Parallel Execution: The instruction set should allow for
creating and terminating parallel execution paths, enabling
efficient parallel processing.
● Real-Time Features: Hardware counters and real-time clocks are
important for generating unique process IDs, managing
timeouts, and detecting errors via a watchdog timer.
7. Error Detection and Recovery
● Key Concept: In multiprocessor systems, error detection
mechanisms are essential for system stability.
● Watchdog Timer: The processor can include watchdog timers to
monitor key system resources. If a timer isn't reset within a
specific time, it raises an error condition, signaling
potential problems.
● Error Handling: The system can use error detection to trigger
recovery or diagnostic procedures if a fault occurs. The
ability for processors to monitor each other increases the
system’s resilience.
Conclusion
Processors designed for multiprocessing environments must have
features that promote reliability, efficiency, and scalability. Key
features include:
● Process recoverability for handling processor failures,
● Efficient context switching to enable rapid task switching,
● Support for large address spaces for flexibility in data
handling,
● Synchronization primitives for managing concurrency,
● Effective interprocessor communication for efficient
coordination,
● A rich instruction set to support concurrency and parallelism,
and
● Error detection mechanisms for improved system stability.
These features are crucial for creating a high-performance
multiprocessor system.