0% found this document useful (0 votes)

257 views44 pages

Parallel and Distributed Computing Lecture 03

The document discusses parallel and distributed computing. It covers multi-processor vs multi-computer systems, Flynn's taxonomy of parallel systems (SISD, MISD, SIMD, MIMD), and the ideal parallel computer model known as Parallel Random Access Machine (PRAM). PRAM consists of multiple processors that can access a shared global memory simultaneously. PRAM is an extension of the ideal sequential Random Access Machine (RAM) model.

Uploaded by

abdullah ashraf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

257 views44 pages

Parallel and Distributed Computing Lecture 03

Uploaded by

abdullah ashraf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

CS416 – Parallel

and Distributed
Computing
Lecture # 03
Spring 2021
FAST – NUCES, Faisalabad Campus
Agenda
2

 A Quick Review
 Multi-processor vs Multi-computer
 Flynn’s Taxonomy
 SISD
 MISD
 SIMD
 MIMD
 Physical Organization of Parallel Platforms
 PRAM
 Routing techniques and Costs
 Interconnections for Parallel platforms

CS416 - Spring 2021

Review to the Previous Lecture
3

 Amdahl’s Law of Parallel Speedup

 Purpose, derivation, and examples
 Karp-Flatt Metric
 Finding sequential fraction in the given parallel setup
 Types of Parallelism
 Data-parallelism
 Same operation on different data elements
 Functional-parallelism
 Different independent tasks with different operations on
different data elements can be parallelized
 Pipelining
 Overlapping the instructions in a single instruction cycle to
achieve parallelism

CS416 - Spring 2021

4 Karp-Flatt Metric

CS416 - Spring 2021

Karp-Flatt Metric(Review)
5

 The metric is used to calculate serial fraction for a

given parallel configuration.
 i.e., if a parallel program is exhibiting a speedup S while
using P processing units then the serial fraction e is given by
:-
1ൗ − 1ൗ𝑝
𝑆
𝑒=
1 − 1ൗ𝑝
 Example task: Suppose in a parallel program, for 5
processors, you gained a speedup of 1.25x, determine
sequential fraction of your program.

CS416 - Spring 2021

Quick Review to the Previous Lecture
9

 Multiprocessor
 Centralized multiprocessor
 Distributed multiprocessor
 Shared address space(NUMA) vs Shared memory(UMA)
 Multicomputer
 Asymmetrical
 Symmetrical
 Cluster vs Network of Workstations

CS416 - Spring 2021

Multi-processor
vs
10
Multi-Computer

CS416 - Spring 2021

Multi-Processor
11

 Multiple-CPUs with a shared memory

 The same address on two different CPUs refers to the

same memory location.

 Generally, two categories:-

1. Centralized Multi-processors
2. Distributed Multi-processor

CS416 - Spring 2021

Multi-Processor

i. Centralized Multi-processor
 Additional CPUs are attached
to the system bus, and all the
processors share the same
primary memory
 All the memory is at one place
and has the same access time
from every processor
 Also known as UMA(Uniform
Memory Access) multi-
processor or SMP (symmetrical
Multi-processor )

12 CS416 - Spring 2021

Multi-Processor

ii. Distributed Multi-processor

 Distributed collection of
memories forms one logical
address space
 Again, the same address on
different processors refers to
the same physical memory
location.
 Also known as non-uniform
memory access (NUMA)
architecture
 Memory access time varies
significantly depending on the
physical location of the
referenced address

13 CS416 - Spring 2021

Cache consistency issues [assigned reading]

CS416 - Spring 2021

Multi-Computer

 Distributed-memory, multi-CPU computer

 Unlike NUMA architecture, a multicomputer has
disjoint local address spaces
 The same address on different processors refers to
two different physical memory locations.

 Each processor has direct access to their local

memories only.
 Processors interact with each other through
Message-Passing

15 CS416 - Spring 2021

Multi-Computer
Asymmetric Multi-Computers

 A front-end computer that

interacts with users and I/O
devices

 The back-end processors are

dedicatedly used for
“number crunching”

 Front-end computer executes

a fully multiprogram OS and
provides all the functions
needed for program
development

 The backend-ones are

reserved for executing
parallel programs

16 CS416 - Spring 2021

Multi-Computer
Symmetric Multi-
Computers
 Every computer executes
same OS
 Users may log into any of the
computers
 This enables multiple users to
concurrently login, edit and
compile their programs.
 All the nodes can participate
in execution-invocation of a
parallel program

17 CS416 - Spring 2021

Network of Workstations
vs
18
Cluster

CS416 - Spring 2021

Cluster Network of workstations

Usually, a co-located collection of A dispersed collection of

low-cost computers and switches, computers. Individual workstations
dedicated for running parallel jobs. may have different Operating
All computer run the same version systems and executable programs
of operating system.
Some of the computers may not User have the power to login and
have interfaces for the users to power off their workstations
login
Commodity cluster uses high speed Ethernet speed for this network is
networks for communication such usually slower. Typical in range of 10
as fast Ethernet@100Mbps, gigabit Mbps
Ethernet@1000 Mbps and
Myrinet@1920 Mbps.

19 CS416 - Spring 2021

Architectural
20
Classification of Systems

CS416 - Spring 2021

Flynn’s Taxonomy
21

 Widely used architectural classification scheme

 Classifies architectures into four types
 The classification is based on how data and
instructions flow through the cores.

CS416 - Spring 2021

Flynn’s
Taxonomy
SISD (Single Instruction Single Data)
 Refers to traditional computer: a
serial architecture

 This architecture includes single

core computers

 Single instruction stream is in

execution at a given time

 Similarly, only one data stream is

active at any time

22 CS416 - Spring 2021

Flynn’s
Taxonomy
MISD (Multiple Instructions Single Data)
 Multiple instruction stream and
single data stream
 A pipeline of multiple
independently executing
functional units
 Each operating on a single
stream of data and forwarding
results from one to the next
 Rarely used in practice
 E.g., Systolic arrays : network of
primitive processing elements that
pump data.

23 CS416 - Spring 2021

Flynn’s Taxonomy

SIMD (Single Instruction Multiple Data)

 Refers to parallel architecture with
multiple cores
 All the cores execute the same
instruction stream at any time but, data
stream is different for the each.
 Well-suited for the scientific operations
requiring large matrix-vector operations
 Vector computers (Cray vector
processing machine) and Intel co-
processing unit ‘MMX’ fall under this
category.
 Used with array operations, image
processing and graphics

24 CS416 - Spring 2021

Flynn’s
Taxonomy
MIMD (Multiple Instructions Multiple
Data)
 Multiple instruction streams and
multiple data streams
 Different CPUs can simultaneously
execute different instruction
streams manipulating different
data
 Most of the contemporary parallel
architectures fall under this
category e.g., Multiprocessor and
multicomputer architectures
 Many MIMD architectures include
SIMD executions by default.

25 CS416 - Spring 2021

Flynn’s Taxonomy

A typical SIMD architecture (a) and a typical MIMD architecture (b).

26 CS416 - Spring 2021
SIMD-MIMD Comparison
27

 SIMD computers require less hardware than MIMD

computers (single control unit).
 However, since SIMD processors are specially
designed, they tend to be expensive and have long
design cycles.
 Not all applications are naturally suited to SIMD
processors.
 In contrast, platforms supporting the SPMD (Same
Program Multiple Data) paradigm can be built from
inexpensive off-the-shelf components with relatively
little effort in a short amount of time.
 The Term SPMD is close variant of MIMD

CS416 - Spring 2021

Physical Organization of
28
Parallel Platforms

CS416 - Spring 2021

Architecture of an Ideal Parallel Computer
29

Parallel Random Access Machine (PRAM)

 An extension to ideal sequential model: random
access machine (RAM)
 PRAMs consist of p processors
 A global memory
 Unbounded size
 Uniformly accessible to all processors with same address
space
 Processors share a common clock but may execute
different instructions in each cycle.
 Based on simultaneous memory access
mechanisms, PRAM can further be classified.
CS416 - Spring 2021
Architecture of an Ideal Parallel Computer
30

Parallel Random Access Machine (PRAM)

 PRAMs can be divided into four subclasses.
1. Exclusive-read, exclusive-write (EREW) PRAM
 No concurrent read/write operations allowed
 Weakest PRAM model, provides minimum memory access
concurrency
2. Concurrent-read, exclusive-write (CREW) PRAM
 Multiple write accesses to a memory location are serialized
3. Exclusive-read, concurrent-write (ERCW) PRAM
4. Concurrent-read, concurrent-write (CRCW) PRAM
 Most powerful PRAM model

CS416 - Spring 2021

Architecture of an Ideal Parallel Computer
31

Parallel Random Access Machine (PRAM)

 Exclusive reads do not create any sematic
inconsistencies

 But, What about the concurrent writes?

 Need of an arbitration(mediation) mechanism to

resolve concurrent write access

CS416 - Spring 2021

Architecture of an Ideal Parallel Computer
32

Parallel Random Access Machine (PRAM)

Mostly used arbitration protocols: -
 Common: write only if all values are identical
 Arbitrary: write the data from a randomly selected
processor and ignore the rest.
 Priority: follow a predetermined priority order.
Processor with highest priority succeeds and the rest
fail.
 Sum: Write the sum of the data items in all the write
requests. The model can be extended for any of the
associative operators, that is defined for data being
written.
CS416 - Spring 2021
Architecture of an Ideal Parallel Computer
33
Physical Complexity of an Ideal Parallel Computer
 Assume realizations of EREW PRAM
 Processors and memories are connected via
switches.
 Since these switches must operate in O(1) time at
the level of words, for a system of p processors and
m words, the switch complexity is O(mp).
 Clearly, for meaningful values of p and m, a true
PRAM is not realizable.

CS416 - Spring 2021

34
Communication Costs
in Parallel Machines

CS416 - Spring 2021

Communication Costs in Parallel
35 Machines
 Along with idling, communication is a major
overhead in parallel programs.
 The communication cost is usually dependent on a
number of features including the following:
 Programming model for communication
 Network topology
 Data handling and routing
 Associated network protocols
 Usually, distributed systems suffer from major
communication overheads.

CS416 - Spring 2021

Message Passing Costs in Parallel
36 Computers
 The total time to transfer a message over a network
comprises of the following:
 Startup time (ts): Time spent at sending and receiving nodes
(preparing the message[adding headers, trailers, and parity
information ] , executing the routing algorithm, establishing
interface between node and router, etc.).
 Per-hop time (th): This time is a function of number of hops
and includes factors such as switch latencies, network
delays, etc.
 Also known as node latency.
 Per-word transfer time (tw): This time includes all overheads
that are determined by the length of the message. This
includes bandwidth of links, and buffering overheads, etc.

CS416 - Spring 2021

Message Passing Costs in Parallel
37 Computers
Store-and-Forward Routing
 A message traversing multiple hops is completely
received at an intermediate hop before being
forwarded to the next hop.
 The total communication cost for a message of size
m words to traverse l communication links is

 In most platforms, th is small and the above

expression can be approximated by

CS416 - Spring 2021

Message Passing Costs in Parallel
38 Computers
Packet Routing
 Store-and-forward makes poor use of
communication resources.
 Packet routing breaks messages into packets and
pipelines them through the network.
 Since packets may take different paths, each
packet must carry routing information, error
checking, sequencing, and other related header
information.
 The total communication time for packet routing is
approximated by:
 Here factor tw also accounts for overheads in packet
headers.
CS416 - Spring 2021
Message Passing Costs in Parallel
39 Computers
Cut-Through Routing
 Takes the concept of packet routing to an extreme
by further dividing messages into basic units called
flits or flow control digits.
 Since flits are typically small, the header information
must be minimized.
 This is done by forcing all flits to take the same path,
in sequence.
 A tracer message first programs all intermediate
routers. All flits then take the same route.
 Error checks are performed on the entire message,
as opposed to flits.
 No sequence numbers are needed.
CS416 - Spring 2021
Message Passing Costs in Parallel
40 Computers
Cut-Through Routing
 The total communication time for cut-through
routing is approximated by:

 This is identical to packet routing, however, tw is

typically much smaller than the tw in packet routing

CS416 - Spring 2021

Message Passing Costs in Parallel
41 Computers

(a) through a store-and-forward

communication network;

b) and (c) extending the concept to

cut-through routing.

CS416 - Spring 2021

Message Passing Costs in Parallel
42 Computers
Simplified Cost Model for Communicating Messages
 The cost of communicating a message between
two nodes l hops away using cut-through routing is
given by

 In this expression, th is typically smaller than ts and tw.

For this reason, the second term in the RHS does not
show, particularly, when m is large.

 For these reasons, we can approximate the cost of

message transfer by

CS416 - Spring 2021

Message Passing Costs in Parallel
43 Computers
Simplified Cost Model for Communicating Messages
 It is important to note that the original expression for
communication time is valid for only uncongested
networks.

 Different communication patterns congest different

networks to varying extents.

 It is important to understand and account for this in

the communication time accordingly.

CS416 - Spring 2021

Questions

CS416 - Spring 2021

References
62
1. Flynn, M., “Some Computer Organizations and Their Effectiveness,” IEEE Transactions on Computers, Vol. C-21,
No. 9, September 1972.
2. Kumar, V., Grama, A., Gupta, A., & Karypis, G. (1994). Introduction to parallel computing (Vol. 110). Redwood City,
CA: Benjamin/Cummings.
3. Quinn, M. J. Parallel Programming in C with MPI and OpenMP,(2003).

CS416 - Spring 2021

2-Amdahls Law
No ratings yet
2-Amdahls Law
32 pages
3 4 Flayynn Taxonomy, Network
No ratings yet
3 4 Flayynn Taxonomy, Network
84 pages
Von Neumann Architecture
No ratings yet
Von Neumann Architecture
3 pages
300 Plus Computer Mcqs PDF Notes For All Exams
83% (12)
300 Plus Computer Mcqs PDF Notes For All Exams
42 pages
HP Dealers in Delhi
No ratings yet
HP Dealers in Delhi
8 pages
FibrChanl-01 (FCD) Fibre Channel Mass Storage Driver For HP-UX 11i v3 Release Notes
No ratings yet
FibrChanl-01 (FCD) Fibre Channel Mass Storage Driver For HP-UX 11i v3 Release Notes
13 pages
Solved Assignment - Parallel Processing
63% (8)
Solved Assignment - Parallel Processing
29 pages
CS416 - Parallel and Distributed Computing: Lecture # 01
No ratings yet
CS416 - Parallel and Distributed Computing: Lecture # 01
20 pages
Booting ARM Linux PDF
No ratings yet
Booting ARM Linux PDF
15 pages
OSI Model: 7 Layers Explained
No ratings yet
OSI Model: 7 Layers Explained
2 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Industrial Applications Using Neural Networks
No ratings yet
Industrial Applications Using Neural Networks
11 pages
Sfr1m44-Lun For Muller3 Staubli-Jc5 Bonas
No ratings yet
Sfr1m44-Lun For Muller3 Staubli-Jc5 Bonas
2 pages
BBMP Building Bye-Laws 2003: Concept and Contents
No ratings yet
BBMP Building Bye-Laws 2003: Concept and Contents
23 pages
General System Architecture
No ratings yet
General System Architecture
28 pages
Module-5 ACA PDF
100% (1)
Module-5 ACA PDF
30 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
Sublimation Printers
100% (1)
Sublimation Printers
2 pages
Development of Verification IP of Physical Layer of PCIe
100% (1)
Development of Verification IP of Physical Layer of PCIe
5 pages
09 - Thread Level Parallelism
50% (2)
09 - Thread Level Parallelism
34 pages
CS8591 Computer Networks L T P C 3 0 0 3 Objectives
0% (1)
CS8591 Computer Networks L T P C 3 0 0 3 Objectives
5 pages
Distributed Computing Lab Workbook V1.0
100% (1)
Distributed Computing Lab Workbook V1.0
129 pages
Parallel Computing Pastpaper Solve by Noman Tariq
No ratings yet
Parallel Computing Pastpaper Solve by Noman Tariq
30 pages
Lab12 Parallel and Distributed Computing
No ratings yet
Lab12 Parallel and Distributed Computing
14 pages
Lecture-3 Parallel Computer Memory Architecture
No ratings yet
Lecture-3 Parallel Computer Memory Architecture
14 pages
COSC 4101 Parallel and Distributed Computing Final
100% (1)
COSC 4101 Parallel and Distributed Computing Final
4 pages
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
No ratings yet
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
65 pages
Fallsem2019-20 Cse4001 Eth Vl2019201001348 Reference Material Cse4001 Parallel and Distributed Computing May 2019 (003) 18
No ratings yet
Fallsem2019-20 Cse4001 Eth Vl2019201001348 Reference Material Cse4001 Parallel and Distributed Computing May 2019 (003) 18
4 pages
Twenty Years of Innovation: HP Deskjet Printers 1988 - 2008: Timeline
No ratings yet
Twenty Years of Innovation: HP Deskjet Printers 1988 - 2008: Timeline
10 pages
Unreadable Document Analysis
No ratings yet
Unreadable Document Analysis
3 pages
Embedded System MCQ
50% (2)
Embedded System MCQ
11 pages
Chapter 9 - M J Flynn Classification
No ratings yet
Chapter 9 - M J Flynn Classification
14 pages
Thread Level Parallelism
No ratings yet
Thread Level Parallelism
21 pages
SQLSaturday - Understanting Memory Pressure PDF
No ratings yet
SQLSaturday - Understanting Memory Pressure PDF
65 pages
Data Communication and Computer Networking: Content
No ratings yet
Data Communication and Computer Networking: Content
42 pages
Intro to Parallel Computing
No ratings yet
Intro to Parallel Computing
47 pages
Unit1 Parallel and Distributed
No ratings yet
Unit1 Parallel and Distributed
21 pages
Syllabus Parallel Computing
No ratings yet
Syllabus Parallel Computing
5 pages
Parallel Computing for Students
No ratings yet
Parallel Computing for Students
113 pages
200 Computer MCQs: Fundamentals & Memory
No ratings yet
200 Computer MCQs: Fundamentals & Memory
22 pages
Siemens Orbic Software Recovery
100% (5)
Siemens Orbic Software Recovery
65 pages
Unit-3 Information and Communication Technology Skills-1
No ratings yet
Unit-3 Information and Communication Technology Skills-1
6 pages
Parallel Programming Essentials
No ratings yet
Parallel Programming Essentials
48 pages
Course Outline
No ratings yet
Course Outline
8 pages
Fundamentals of Parallel Computing
No ratings yet
Fundamentals of Parallel Computing
42 pages
Data Base and Management System Lab
No ratings yet
Data Base and Management System Lab
8 pages
Python Lab Manual
No ratings yet
Python Lab Manual
33 pages
Entities and Their Properties
No ratings yet
Entities and Their Properties
9 pages
Page Replacement Algorithms Guide
No ratings yet
Page Replacement Algorithms Guide
3 pages
Dell 153537 Laptop Schematics
No ratings yet
Dell 153537 Laptop Schematics
55 pages
Biostar MCP6P M2+ Spec
No ratings yet
Biostar MCP6P M2+ Spec
2 pages
Assignment: Parallel and Distributed Computing Submitted To: Sir Shoaib Date: 25-03-2019
No ratings yet
Assignment: Parallel and Distributed Computing Submitted To: Sir Shoaib Date: 25-03-2019
5 pages
University of Gujrat: Faculty of CS & IT
No ratings yet
University of Gujrat: Faculty of CS & IT
4 pages
The Islamia College of Science & Commerce, Srinagar - J &K Department of Computer Applica Tions
No ratings yet
The Islamia College of Science & Commerce, Srinagar - J &K Department of Computer Applica Tions
15 pages
Amdahl's Law & Parallelism Concepts
No ratings yet
Amdahl's Law & Parallelism Concepts
17 pages
Memory Organization and Structure in Assembly Language
No ratings yet
Memory Organization and Structure in Assembly Language
6 pages
Parallel and Distributed Computing Architectures A PDF
No ratings yet
Parallel and Distributed Computing Architectures A PDF
286 pages
Module 2: Goals of Parallelism Week 2 Learning Outcomes:: General-Purpose Computing On Graphics Processing Units
No ratings yet
Module 2: Goals of Parallelism Week 2 Learning Outcomes:: General-Purpose Computing On Graphics Processing Units
11 pages
OS R23 - UNIT-3 (Part-1)
No ratings yet
OS R23 - UNIT-3 (Part-1)
15 pages
QM - Xc6Slx16 - Sdram DB: User Manual
No ratings yet
QM - Xc6Slx16 - Sdram DB: User Manual
18 pages
Competency Based Learning Material
No ratings yet
Competency Based Learning Material
55 pages
GF7050V-M 7 SE BIOS M Anual
No ratings yet
GF7050V-M 7 SE BIOS M Anual
36 pages
Printers and Primary Memory
No ratings yet
Printers and Primary Memory
6 pages
Bhawini NLP Practical
No ratings yet
Bhawini NLP Practical
98 pages
What Are Different Types of Computer Keyboards
No ratings yet
What Are Different Types of Computer Keyboards
6 pages
Miss Nasreen Anjum: Artificial Intelligence (AI)
No ratings yet
Miss Nasreen Anjum: Artificial Intelligence (AI)
21 pages
Types of CPU Scheduling Algorithms
No ratings yet
Types of CPU Scheduling Algorithms
11 pages
Intro Parallel Computing PDF
No ratings yet
Intro Parallel Computing PDF
58 pages
Com 126theory Book PC Upgrade Maintenance
No ratings yet
Com 126theory Book PC Upgrade Maintenance
83 pages
Parallel & Distributed Computing
100% (1)
Parallel & Distributed Computing
52 pages
Q3 Module5 CSS9 San-Jacinto-NHS
No ratings yet
Q3 Module5 CSS9 San-Jacinto-NHS
6 pages
Parallel & Distributed Computing Course Overview
No ratings yet
Parallel & Distributed Computing Course Overview
63 pages
Logs Obs
No ratings yet
Logs Obs
7 pages
Maxhub s65 Interactive Flat Panels
No ratings yet
Maxhub s65 Interactive Flat Panels
2 pages
Week1 - 02 DR - Muhammad Farrukh Shahid FCV-ACV
No ratings yet
Week1 - 02 DR - Muhammad Farrukh Shahid FCV-ACV
147 pages
PDC 1 - PD Computing
No ratings yet
PDC 1 - PD Computing
12 pages
Computer Graphics: Key Applications
No ratings yet
Computer Graphics: Key Applications
11 pages
Microprocessor Proposal
No ratings yet
Microprocessor Proposal
4 pages
Hill CH 1 Ed 3
No ratings yet
Hill CH 1 Ed 3
60 pages
Ethernet Layer 2, Switching and Bridging Logic Lec2
No ratings yet
Ethernet Layer 2, Switching and Bridging Logic Lec2
15 pages
Internet Protocols and Web Basics
No ratings yet
Internet Protocols and Web Basics
22 pages
Professional Practices: "Intellectual Property Rights"
No ratings yet
Professional Practices: "Intellectual Property Rights"
36 pages
Embedded OS: Real-Time Systems
No ratings yet
Embedded OS: Real-Time Systems
26 pages
Programming Fundamentals (C Language) - Notes
100% (2)
Programming Fundamentals (C Language) - Notes
55 pages
Data Mining (DM)
No ratings yet
Data Mining (DM)
45 pages
1multiprocessors and Multicomputers: A. Multiprocessor System Interconnects
No ratings yet
1multiprocessors and Multicomputers: A. Multiprocessor System Interconnects
16 pages
Lenovo ThinkCentre M900 Tiny Project TinyMiniMicro Guide
No ratings yet
Lenovo ThinkCentre M900 Tiny Project TinyMiniMicro Guide
8 pages
G9 CSS Q1 LESSON 3 UCS Major Hardware Components
No ratings yet
G9 CSS Q1 LESSON 3 UCS Major Hardware Components
6 pages
Multiprocessor vs Multicomputer Systems
No ratings yet
Multiprocessor vs Multicomputer Systems
27 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages

Parallel and Distributed Computing Lecture 03

Uploaded by

Parallel and Distributed Computing Lecture 03

Uploaded by

CS416 – Parallel

CS416 - Spring 2021

 Amdahl’s Law of Parallel Speedup

CS416 - Spring 2021

CS416 - Spring 2021

 The metric is used to calculate serial fraction for a

CS416 - Spring 2021

CS416 - Spring 2021

CS416 - Spring 2021

CS416 - Spring 2021

CS416 - Spring 2021

 Multiple-CPUs with a shared memory

 The same address on two different CPUs refers to the

 Generally, two categories:-

CS416 - Spring 2021

12 CS416 - Spring 2021

ii. Distributed Multi-processor

13 CS416 - Spring 2021

CS416 - Spring 2021

 Distributed-memory, multi-CPU computer

 Each processor has direct access to their local

15 CS416 - Spring 2021

 A front-end computer that

 The back-end processors are

 Front-end computer executes

 The backend-ones are

16 CS416 - Spring 2021

17 CS416 - Spring 2021

CS416 - Spring 2021

Usually, a co-located collection of A dispersed collection of

19 CS416 - Spring 2021

CS416 - Spring 2021

 Widely used architectural classification scheme

CS416 - Spring 2021

 This architecture includes single

 Single instruction stream is in

 Similarly, only one data stream is

22 CS416 - Spring 2021

23 CS416 - Spring 2021

SIMD (Single Instruction Multiple Data)

24 CS416 - Spring 2021

25 CS416 - Spring 2021

A typical SIMD architecture (a) and a typical MIMD architecture (b).

 SIMD computers require less hardware than MIMD

CS416 - Spring 2021

CS416 - Spring 2021

Parallel Random Access Machine (PRAM)

Parallel Random Access Machine (PRAM)

CS416 - Spring 2021

Parallel Random Access Machine (PRAM)

 But, What about the concurrent writes?

 Need of an arbitration(mediation) mechanism to

CS416 - Spring 2021

Parallel Random Access Machine (PRAM)

CS416 - Spring 2021

CS416 - Spring 2021

CS416 - Spring 2021

CS416 - Spring 2021

 In most platforms, th is small and the above

CS416 - Spring 2021

 This is identical to packet routing, however, tw is

CS416 - Spring 2021

(a) through a store-and-forward

b) and (c) extending the concept to

CS416 - Spring 2021

 In this expression, th is typically smaller than ts and tw.

 For these reasons, we can approximate the cost of

CS416 - Spring 2021

 Different communication patterns congest different

 It is important to understand and account for this in

CS416 - Spring 2021

CS416 - Spring 2021

CS416 - Spring 2021

You might also like