0% found this document useful (0 votes)

12 views50 pages

Chapter 8 - Parallel Processing

Chapter 8 discusses parallel processing, outlining various processor organizations such as SISD, SIMD, MISD, and MIMD, and their characteristics. It covers the architecture of symmetric multiprocessors (SMP), including advantages and disadvantages of bus organization, multiprocessor operating system design considerations, and cache coherence protocols. Additionally, it addresses multithreading, clusters, and nonuniform memory access (NUMA) as alternatives to SMP for enhancing performance and scalability in computing systems.

Uploaded by

layermunch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views50 pages

Chapter 8 - Parallel Processing

Uploaded by

layermunch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Chapter 8

PARALLEL PROCESSING
+
Multiple Processor Organization

 Single instruction, single data  Multiple instruction, single data

(SISD) stream (MISD) stream
 Single processor executes a  A sequence of data is transmitted
single instruction stream to to a set of processors, each of
operate on data stored in a single which executes a different
memory instruction sequence
 Uniprocessors fall into this  Not commercially implemented
category

 Single instruction, multiple data  Multiple instruction, multiple

(SIMD) stream data (MIMD) stream
 A single machine instruction  A set of processors
controls the simultaneous simultaneously execute different
execution of a number of instruction sequences on different
processing elements on a data sets
lockstep basis  SMPs, clusters and NUMA systems
 Vector and array processors fall fit this category
into this category
Figure 8.1
Figure 8.2
Symmetric Multiprocessor (SMP)

A stand alone computer with

the following characteristics:
Processors All System
share same processors controlled by
memory and share access integrated
I/O facilities to I/O All operating
Two or more devices processors system
• Processors are
similar connected by a • Either through can perform
• Provides
processors of bus or other same channels the same interaction
comparable internal or different functions between
capacity connection channels giving (hence processors and
• Memory access paths to same their programs
time is devices
“symmetric”) at job, task, file
approximately and data
the same for element levels
each processor
Multiprogramming
and
Multiprocessing

Figure 8.3
Figure 8.4
Symmetric Multiprocessor
Organization

Figure 8.5
+
The bus organization has several
attractive features:

 Simplicity
 Simplest approach to multiprocessor organization

 Flexibility
 Generally easy to expand the system by attaching more
processors to the bus

 Reliability
 The bus is essentially a passive medium and the failure of any
attached device should not cause failure of the whole system
+
Disadvantages of the bus organization:

 Main drawback is performance

 All memory references pass through the common bus
 Performance is limited by bus cycle time

 Each processor should have cache memory

 Reduces the number of bus accesses

 Leads to problems with cache coherence

 If a word is altered in one cache it could conceivably invalidate a
word in another cache
 To prevent this the other processors must be alerted that an
update has taken place
 Typically addressed in hardware rather than the operating system
+
Multiprocessor Operating System
Design Considerations
 Simultaneous concurrent processes
 OS routines need to be reentrant to allow several processors to execute the same IS code simultaneously
 OS tables and management structures must be managed properly to avoid deadlock or invalid operations

 Scheduling
 Any processor may perform scheduling so conflicts must be avoided
 Scheduler must assign ready processes to available processors

 Synchronization
 With multiple active processes having potential access to shared address spaces or I/O resources, care must be
taken to provide effective synchronization
 Synchronization is a facility that enforces mutual exclusion and event ordering

 Memory management
 In addition to dealing with all of the issues found on uniprocessor machines, the OS needs to exploit the available
hardware parallelism to achieve the best performance
 Paging mechanisms on different processors must be coordinated to enforce consistency when several processors
share a page or segment and to decide on page replacement

 Reliability and fault tolerance

 OS should provide graceful degradation in the face of processor failure
 Scheduler and other portions of the operating system must recognize the loss of a processor and restructure
accordingly
+
Cache Coherence
Software Solutions

 Attempt to avoid the need for additional hardware circuitry

and logic by relying on the compiler and operating system to
deal with the problem

 Attractive because the overhead of detecting potential

problems is transferred from run time to compile time, and
the design complexity is transferred from hardware to
software
 However, compile-time software approaches generally must make
conservative decisions, leading to inefficient cache utilization
+
Cache Coherence
Hardware-Based Solutions
 Generally referred to as cache coherence protocols

 These solutions provide dynamic recognition at run time of

potential inconsistency conditions

 Because the problem is only dealt with when it actually arises

there is more effective use of caches, leading to improved
performance over a software approach

 Approaches are transparent to the programmer and the

compiler, reducing the software development burden

 Can be divided into two categories:

 Directory protocols
 Snoopy protocols
Directory Protocols

Collect and Effective in large

maintain scale systems with
information about complex
copies of data in interconnection
cache schemes

Directory stored in Creates central

main memory bottleneck

Requests are Appropriate

checked against transfers are
directory performed
Snoopy Protocols
 Distribute the responsibility for maintaining cache coherence
among all of the cache controllers in a multiprocessor
 A cache must recognize when a line that it holds is shared with other
caches
 When updates are performed on a shared cache line, it must be
announced to other caches by a broadcast mechanism
 Each cache controller is able to “snoop” on the network to observe
these broadcast notifications and react accordingly

 Suited to bus-based multiprocessor because the shared bus

provides a simple means for broadcasting and snooping
 Care must be taken that the increased bus traffic required for
broadcasting and snooping does not cancel out the gains from the
use of local caches

 Two basic approaches have been explored:

 Write invalidate
 Write update (or write broadcast)
+
Write Invalidate

 Multiple readers, but only one writer at a time

 When a write is required, all other caches of the line are

invalidated

 Writing processor then has exclusive (cheap) access until

line is required by another processor

 Most widely used in commercial multiprocessor systems

such as the Pentium 4 and PowerPC

 State of every line is marked as modified, exclusive, shared

or invalid
 For this reason the write-invalidate protocol is called MESI
+
Write Update

 Can be multiple readers and writers

 When a processor wishes to update a shared line the word to

be updated is distributed to all others and caches containing
that line can update it

 Some systems use an adaptive mixture of both write-

invalidate and write-update mechanisms
+
MESI Protocol
To provide cache consistency on an SMP the data cache
supports a protocol known as MESI:

 Modified
 The line in the cache has been modified and is available only in
this cache

 Exclusive
 The line in the cache is the same as that in main memory and is
not present in any other cache

 Shared
 The line in the cache is the same as that in main memory and may
be present in another cache

 Invalid
 The line in the cache does not contain valid data
Table 8.1
MESI Cache Line States
MESI State Transition Diagram

Figure 8.6
+
Multithreading and Chip
Multiprocessors
 Processor performance can be measured by the rate at which it
executes instructions

 MIPS rate = f * IPC

 f = processor clock frequency, in MHz
 IPC = average instructions per cycle

 Increase performance by increasing clock frequency and

increasing instructions that complete during cycle

 Multithreading
 Allows for a high degree of instruction-level parallelism without
increasing circuit complexity or power consumption
 Instruction stream is divided into several smaller streams, known as
threads, that can be executed in parallel
Definitions of Threads
and Processes Thread in multithreaded
processors may or may not be
the same as the concept of
software threads in a
multiprogrammed operating
system

Thread is concerned with

Thread switch scheduling and execution,
whereas a process is
• The act of switching processor control
between threads within the same concerned with both
process scheduling/execution and
• Typically less costly than process resource and resource
switch ownership

Thread:
• Dispatchable unit of work within a Process:
process • An instance of program running on
• Includes processor context (which computer
includes the program counter and • Two key characteristics:
stack pointer) and data area for stack
• Resource ownership
• Executes sequentially and is
interruptible so that the processor can • Scheduling/execution
turn to another thread

Process switch
• Operation that switches the processor
from one process to another by saving all
the process control data, registers, and
other information for the first and
replacing them with the process
information for the second
Implicit and Explicit
Multithreading
 All commercial processors and most
experimental ones use explicit multithreading
 Concurrently execute instructions from different
explicit threads
 Interleave instructions from different threads on
shared pipelines or parallel execution on parallel
pipelines

 Implicit multithreading is concurrent execution

+ of multiple threads extracted from single
sequential program
 Implicit threads defined statically by compiler or
dynamically by hardware
+ Approaches to Explicit
Multithreading
 Interleaved  Blocked
 Fine-grained  Coarse-grained
 Processor deals with two or  Thread executed until event
more thread contexts at a causes delay
time  Effective on in-order
 Switching thread at each processor
clock cycle  Avoids pipeline stall
 If thread is blocked it is
skipped  Chip multiprocessing
 Processor is replicated on a
 Simultaneous (SMT) single chip
 Instructions are  Each processor handles
simultaneously issued from separate threads
multiple threads to  Advantage is that the
execution units of available logic area on a chip
superscalar processor is used effectively
+

Approaches to
Executing Multiple
Threads

Figure 8.7
+
Example Systems

Pentium 4 IBM Power5

 More recent models of the  Chip used in high-end
Pentium 4 use a multithreading PowerPC products
technique that Intel refers to as
hyperthreading  Combines chip
multiprocessing with SMT
 Approach is to use SMT with  Has two separate processors,
support for two threads each of which is a multithreaded
processor capable of supporting
 Thus the single multithreaded two threads concurrently using
processor is logically two SMT
processors  Designers found that having two
two-way SMT processors on a
single chip provided superior
performance to a single four-
way SMT processor
Power5 Instruction Data Flow

Figure 8.8
Clusters
 Alternative to SMP as an approach to providing
high performance and high availability

 Particularly attractive for server applications

 Defined as:
 A group of interconnected whole computers working
together as a unified computing resource that can
create the illusion of being one machine
 (The term whole computer means a system that can run
on its own, apart from the cluster)

 Each computer in a cluster is called a node

+  Benefits:
 Absolute scalability
 Incremental scalability
 High availability
 Superior price/performance
+

Cluster
Configurations

Figure 8.9
Table 8.2
Clustering Methods: Benefits and Limitations
+
Operating System Design Issues

 How failures are managed depends on the clustering method used

 Two approaches:
 Highly available clusters
 Fault tolerant clusters

 Failover
 The function of switching applications and data resources over from a failed system
to an alternative system in the cluster

 Failback
 Restoration of applications and data resources to the original system once it
has been fixed

 Load balancing
 Incremental scalability
 Automatically include new computers in scheduling
 Middleware needs to recognize that processes may switch between machines
Parallelizing Computation

Effective use of a cluster requires executing

software from a single application in parallel

Three approaches are:

Parallelizing complier Parallelized Parametric computing

• Determines at compile time application • Can be used if the essence of
which parts of an application • Application written from the the application is an
can be executed in parallel outset to run on a cluster and algorithm or program that
• These are then split off to be uses message passing to must be executed a large
assigned to different move data between cluster number of times, each time
computers in the cluster nodes with a different set of starting
conditions or parameters
Cluster Computer Architecture

Figure 8.10
Example
100-Gbps
Ethernet
Configuration
for Massive
Blade Server
Site

Figure 8.11
+
Clusters Compared to SMP
 Both provide a configuration with multiple processors to
support high demand applications
 Both solutions are available commercially

SMP Clustering
 Easier to manage and  Far superior in terms of
configure incremental and absolute
scalability
 Much closer to the original
single processor model for  Superior in terms of
which nearly all applications availability
are written
 All components of the system
 Less physical space and lower can readily be made highly
power consumption redundant

 Well established and stable

+
Nonuniform Memory Access
(NUMA)
 Alternative to SMP and clustering

 Uniform memory access (UMA)

 All processors have access to all parts of main memory using loads and stores
 Access time to all regions of memory is the same
 Access time to memory for different processors is the same

 Nonuniform memory access (NUMA)

 All processors have access to all parts of main memory using loads and stores
 Access time of processor differs depending on which region of main memory
is being accessed
 Different processors access different regions of memory at different speeds

 Cache-coherent NUMA (CC-NUMA)

 A NUMA system in which cache coherence is maintained among the caches of
the various processors
Motivation
SMP has practical limit to In clusters each node has its
number of processors that own private main memory
can be used • Applications do not see a large
• Bus traffic limits to between 16 and global memory
64 processors • Coherency is maintained by
software rather than hardware

Objective with NUMA is to

maintain a transparent
NUMA retains SMP flavor system wide memory while
while giving large scale permitting multiple
multiprocessing multiprocessor nodes, each
with its own bus or internal
interconnect system
+

CC-NUMA
Organization

Figure 8.12
+
NUMA Pros and Cons

 Main advantage of a CC-

NUMA system is that it can
deliver effective performance
at higher levels of parallelism  Does not transparently look
than SMP without requiring like an SMP
major software changes
 Software changes will be
 Bus traffic on any individual required to move an operating
node is limited to a demand system and applications from
that the bus can handle an SMP to a CC-NUMA system

 If many of the memory  Concern with availability

accesses are to remote nodes,
performance begins to break
down
+
Vector Computation

 There is a need for computers to solve mathematical problems of

physical processes in disciplines such as aerodynamics, seismology,
meteorology, and atomic, nuclear, and plasma physics

 Need for high precision and a program that repetitively performs

floating point arithmetic calculations on large arrays of numbers
 Most of these problems fall into the category known as continuous-field
simulation

 Supercomputers were developed to handle these types of problems

 However they have limited use and a limited market because of their price tag
 There is a constant demand to increase performance

 Array processor
 Designed to address the need for vector computation
 Configured as peripheral devices by both mainframe and minicomputer users
to run the vectorized portions of programs
Vector Addition Example

Figure 8.13
+

Matrix Multiplication
(C = A * B)

Figure 8.14
+
Approaches to
Vector
Computation

Figure 8.15
+

Pipelined Processing
of Floating-Point
Operations

Figure 8.16
A Taxonomy of
Computer Organizations

Figure 8.17
+

IBM 3090 with

Vector Facility

Figure 8.18
+
Alternative
Programs
for Vector
Calculation

Figure 8.19
+

Registers for the IBM

3090 Vector Facility

Figure 8.20
Table 8.3
IBM 3090 Vector Facility:
Arithmetic and Logical Instructions
+ Summary Parallel
Processing
Chapter 8
 Multithreading and chip multiprocessors
 Implicit and explicit multithreading
 Approaches to explicit multithreading
 Multiple processor organizations  Example systems
 Types of parallel processor systems
 Clusters
 Parallel organizations
 Cluster configurations
 Symmetric multiprocessors  Operating system design issues
 Cluster computer architecture
 Organization
 Blade servers
 Multiprocessor operating system
design considerations  Clusters compared to SMP

 Nonuniform memory access

 Cache coherence and the MESI
 Motivation
protocol
 Organization
 Software solutions
 NUMA Pros and cons
 Hardware solutions
 The MESI protocol  Vector computation
 Approaches to vector computation
 IBM 3090 vector facility

Flynn'S Classification: Cs6303 Computer Architecture
No ratings yet
Flynn'S Classification: Cs6303 Computer Architecture
11 pages
Computer System: Operating Systems: Internals and Design Principles
No ratings yet
Computer System: Operating Systems: Internals and Design Principles
62 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
34 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
34 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
Computer-Architecture Q&A
100% (3)
Computer-Architecture Q&A
37 pages
MULTIPROCTLPA
No ratings yet
MULTIPROCTLPA
99 pages
Slot28 CH17 ParallelProcessing 32 Slides
No ratings yet
Slot28 CH17 ParallelProcessing 32 Slides
32 pages
Lecture 12
No ratings yet
Lecture 12
49 pages
Parallel Prrocessor
No ratings yet
Parallel Prrocessor
12 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
CH17 ParallelProcessing 32 Slides
No ratings yet
CH17 ParallelProcessing 32 Slides
32 pages
Multiprocessor
No ratings yet
Multiprocessor
45 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
CH17 COA10e
No ratings yet
CH17 COA10e
45 pages
Unit Iv Parallelism
No ratings yet
Unit Iv Parallelism
80 pages
Parallel Processing
No ratings yet
Parallel Processing
28 pages
Multiprocessor Systems Overview
No ratings yet
Multiprocessor Systems Overview
51 pages
CH01 - Computer System Overview
No ratings yet
CH01 - Computer System Overview
54 pages
PART17
No ratings yet
PART17
45 pages
CH20 COA11e
No ratings yet
CH20 COA11e
40 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
Multicore Question Bank
No ratings yet
Multicore Question Bank
5 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
Parallelism and Multicores
No ratings yet
Parallelism and Multicores
54 pages
CH17 COA9e Parallel Processing
No ratings yet
CH17 COA9e Parallel Processing
52 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Advanced Operating System: Unit I
No ratings yet
Advanced Operating System: Unit I
27 pages
Parallel Processor Overview
No ratings yet
Parallel Processor Overview
32 pages
Mod 7
No ratings yet
Mod 7
56 pages
10 Multithreading
No ratings yet
10 Multithreading
60 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
91 pages
Multiprocessor Basics & Performance
No ratings yet
Multiprocessor Basics & Performance
52 pages
Computer Architecture QBank
No ratings yet
Computer Architecture QBank
12 pages
Unit 6 Mom
No ratings yet
Unit 6 Mom
23 pages
Parallel Processor Insights
No ratings yet
Parallel Processor Insights
32 pages
SMT and CMP Architectures
100% (3)
SMT and CMP Architectures
19 pages
Week 5
No ratings yet
Week 5
52 pages
Hardware Multithreading
100% (1)
Hardware Multithreading
4 pages
Lecture-7 SMP NUMA Cache Coherence
No ratings yet
Lecture-7 SMP NUMA Cache Coherence
34 pages
CH17-COA10e - Parallel Processing
No ratings yet
CH17-COA10e - Parallel Processing
45 pages
Slot28 CH17 ParallelProcessing 32 Slides
No ratings yet
Slot28 CH17 ParallelProcessing 32 Slides
32 pages
Unit6 - Microprocessor - Final 1
No ratings yet
Unit6 - Microprocessor - Final 1
30 pages
William Stallings Computer Organization and Architecture: Parallel Processing
No ratings yet
William Stallings Computer Organization and Architecture: Parallel Processing
40 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
CH5 Parallel Processing
No ratings yet
CH5 Parallel Processing
30 pages
ECE VI Sem Computer Architecture
100% (1)
ECE VI Sem Computer Architecture
13 pages
Multi-Processor / Parallel Processing
No ratings yet
Multi-Processor / Parallel Processing
12 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
Unit VI
No ratings yet
Unit VI
50 pages
Parallel Processing Parallel Processing
No ratings yet
Parallel Processing Parallel Processing
64 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Multi-Processor-Parallel Processing PDF
No ratings yet
Multi-Processor-Parallel Processing PDF
12 pages
ACA UNIT-5 Notes
No ratings yet
ACA UNIT-5 Notes
15 pages
722 9 5 2011 Review
No ratings yet
722 9 5 2011 Review
101 pages
S 00458 Ed 1 V 01 y 201212 Cac 021
No ratings yet
S 00458 Ed 1 V 01 y 201212 Cac 021
111 pages
Unit-5 Part1
No ratings yet
Unit-5 Part1
85 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
Multi-Processor / Parallel Processing
No ratings yet
Multi-Processor / Parallel Processing
12 pages
Lec13 Multiprocessors
No ratings yet
Lec13 Multiprocessors
69 pages
Part IV - NVIDIA Mellanox Bluefield-2 SmartNIC Hands-On Tutorial Levente Csikor CodeX
No ratings yet
Part IV - NVIDIA Mellanox Bluefield-2 SmartNIC Hands-On Tutorial Levente Csikor CodeX
23 pages
Fundamentals of Computer Architecture
No ratings yet
Fundamentals of Computer Architecture
43 pages
Grade 12 IT Theory Notes PDF
No ratings yet
Grade 12 IT Theory Notes PDF
126 pages
Module1 DistributedSystemModels
No ratings yet
Module1 DistributedSystemModels
147 pages
COA UNIT-III Parallel Processors
No ratings yet
COA UNIT-III Parallel Processors
51 pages
Onur 447 Spring15 Lecture9 Branch Prediction Afterlecture
No ratings yet
Onur 447 Spring15 Lecture9 Branch Prediction Afterlecture
65 pages
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
No ratings yet
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
13 pages
Parallel Arch 2
No ratings yet
Parallel Arch 2
9 pages
Pentium M: - 1.30 GHZ To 1.70 GHZ - Primary 32-Kb Instruction Cache - 1-Mb Second Level Cache
No ratings yet
Pentium M: - 1.30 GHZ To 1.70 GHZ - Primary 32-Kb Instruction Cache - 1-Mb Second Level Cache
27 pages
CCS Module 1 Notes
No ratings yet
CCS Module 1 Notes
68 pages
Multi Threading
No ratings yet
Multi Threading
5 pages
IBM I On Power - Performance FAQ
No ratings yet
IBM I On Power - Performance FAQ
109 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
16 pages
Dokumen - Tips Ibm Informix Onpower7 Ibm United States Informix On Power7 Best Practices 6
No ratings yet
Dokumen - Tips Ibm Informix Onpower7 Ibm United States Informix On Power7 Best Practices 6
48 pages
06 Flynn-S Classification
No ratings yet
06 Flynn-S Classification
31 pages
4) Unit 3 MyClass
No ratings yet
4) Unit 3 MyClass
11 pages
Multi Threaded Web Crawler
No ratings yet
Multi Threaded Web Crawler
10 pages
Operating System: CPU Scheduling
No ratings yet
Operating System: CPU Scheduling
15 pages
Server Energy Efficiency 5 Key Insight
No ratings yet
Server Energy Efficiency 5 Key Insight
11 pages
ECE Advanced Architecture Guide
No ratings yet
ECE Advanced Architecture Guide
9 pages

Chapter 8 - Parallel Processing

Uploaded by

Chapter 8 - Parallel Processing

Uploaded by

Chapter 8

 Single instruction, single data  Multiple instruction, single data

 Single instruction, multiple data  Multiple instruction, multiple

A stand alone computer with

 Main drawback is performance

 Each processor should have cache memory

 Leads to problems with cache coherence

 Reliability and fault tolerance

 Attempt to avoid the need for additional hardware circuitry

 Attractive because the overhead of detecting potential

 These solutions provide dynamic recognition at run time of

 Because the problem is only dealt with when it actually arises

 Approaches are transparent to the programmer and the

 Can be divided into two categories:

Collect and Effective in large

Directory stored in Creates central

Requests are Appropriate

 Suited to bus-based multiprocessor because the shared bus

 Two basic approaches have been explored:

 Multiple readers, but only one writer at a time

 When a write is required, all other caches of the line are

 Writing processor then has exclusive (cheap) access until

 Most widely used in commercial multiprocessor systems

 State of every line is marked as modified, exclusive, shared

 Can be multiple readers and writers

 When a processor wishes to update a shared line the word to

 Some systems use an adaptive mixture of both write-

 MIPS rate = f * IPC

 Increase performance by increasing clock frequency and

Thread is concerned with

 Implicit multithreading is concurrent execution

Pentium 4 IBM Power5

 Particularly attractive for server applications

 Each computer in a cluster is called a node

 How failures are managed depends on the clustering method used

Effective use of a cluster requires executing

Three approaches are:

Parallelizing complier Parallelized Parametric computing

 Well established and stable

 Uniform memory access (UMA)

 Nonuniform memory access (NUMA)

 Cache-coherent NUMA (CC-NUMA)

Objective with NUMA is to

 Main advantage of a CC-

 If many of the memory  Concern with availability

 There is a need for computers to solve mathematical problems of

 Need for high precision and a program that repetitively performs

 Supercomputers were developed to handle these types of problems

IBM 3090 with

Registers for the IBM

 Nonuniform memory access

You might also like