0% found this document useful (0 votes)

18 views43 pages

U1-Theory of Parallelism

The document discusses parallel computer architecture, classifying it using Flynn's taxonomy which includes SISD, SIMD, MIMD, and MISD. It covers modern classifications based on data and function parallelism, performance metrics like MIPS and MFLOPS, and different memory architectures such as shared, distributed, and hybrid systems. Additionally, it highlights the advantages and disadvantages of each architecture type and their implications for performance and scalability.

Uploaded by

sadia firdous

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views43 pages

U1-Theory of Parallelism

Uploaded by

sadia firdous

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 43

Parallel computer architecture

classification
Basic Computer Architecture
• Von Neumann Architecture
– Uses Stored Program Concept
– Memory stores both program and data instructions
– CPU gets instructions and/or data
from memory
– decodes the instructions
– Computes sequentially
Hardware Parallelism
• Computing: execute instructions that operate on data.

Computer

Instructions Data

• Flynn’s taxonomy (Michael Flynn, 1967) classifies computer

architectures based on the number of instructions that can be
executed and how they operate on data.
Flynn’s taxonomy
• Single Instruction Single Data (SISD)
– Traditional sequential computing systems
• Single Instruction Multiple Data (SIMD)
• Multiple Instructions Multiple Data (MIMD)
• Multiple Instructions Single Data (MISD)

Computer Architectures

SISD SIMD MIMD MISD

SISD
• At one time, one instruction operates on one
data
• Traditional sequential architecture
SIMD
• At one time, one instruction operates on many data
– Data parallel architecture(ie., exploit data level parallelism)
– Vector architecture has similar characteristics, but achieve the parallelism
with pipelining.
– Performs same operation on multiple data
– Ex.: Adjusting audio of a digital video…
• Array processors, GPUs
Array processor (SIMD)
IP

MAR

MEMORY

OP ADDR MDR
A1 B1 C1 A2 B2 C2 A N B N CN
DECODER

ALU ALU ALU

MIMD
• Multiple instruction streams operating on multiple data
streams
– Classical distributed memory or SMP architectures
– have a number of processors that function asynchronously and
independently
– Ex: Intel Xeon Phi
MISD machine
• Not commonly seen.
• Systolic array is one example of an MISD architecture.
Flynn’s taxonomy summary
• SISD: traditional sequential architecture
• SIMD: processor arrays, vector processor
– Parallel computing on a budget – reduced control unit cost
– Many early supercomputers
• MIMD: most general purpose parallel
computer today
– Clusters, MPP, data centers
• MISD: not a general purpose architecture.
Flynn’s classification on today’s
architectures
• Multicore processors

• Superscalar: Pipelined + multiple issues.

• GPU: Cuda architecture

• IBM BlueGene
Modern classification
(Sima, Fountain, Kacsuk)
• Classify based on how parallelism is achieved
– by operating on multiple data: data parallelism
– by performing many functions in parallel: function
parallelism
• Control parallelism, task parallelism depending on the level of the
functional parallelism.

Parallel architectures

Data-parallel Function-parallel
architectures architectures
Data parallel architectures
• Vector processors, SIMD (array processors), systolic arrays.
IP

MAR
Vector processor (pipelining)

MEMORY

A B C
OP ADDR MDR

DECODER

ALU
Data parallel architecture: Array
processor
IP

MAR

MEMORY

OP ADDR MDR
A1 B1 C1 A2 B2 C2 A N B N CN
DECODER

ALU ALU ALU

Control parallel architectures
Function-parallel
architectures

Instruction level Thread level Process level

Parallel Arch Parallel Arch Parallel Arch
(ILPs) (MIMDs)

Pipelined VLIWs Superscalar Shared

Distributed
processors processors Memory
Memory MIMD
MIMD
Performance of parallel architectures
• Common metrics
– MIPS: million instructions per second
• MIPS = instruction count/(execution time x 106)

– MFLOPS: million floating point operations per second.

• MFLOPS = FP ops in program/(execution time x 106)

• Which is a better metric?

• FLOP is more related to the time of a task in numerical code
– # of FLOP / program is determined by the matrix size
Performance of parallel architectures
• Flops units(Floating Point Operations Per Second)
• Computer performance
• Name Abbr. FLOPS
• kiloFLOPS kFLOPS 103
• megaFLOPS MFLOPS 106
• gigaFLOPS GFLOPS 109
• teraFLOPS TFLOPS 1012
• petaFLOPS PFLOPS 1015
• exaFLOPS EFLOPS 1018
• zettaFLOPS ZFLOPS 1021
• yottaFLOPS YFLOPS 1024
Peak and sustained performance
• Peak performance
– Measured in MFLOPS
– Highest possible MFLOPS when the system does
nothing but numerical computation
– Rough hardware measure
– Little indication on how the system will perform in
practice.
Peak and sustained performance
• Sustained performance
– The MFLOPS rate that a program achieves over the entire run.
• Measuring sustained performance
– Using benchmarks
• Peak MFLOPS is usually much larger than sustained MFLOPS
– Efficiency rate = sustained MFLOPS / peak MFLOPS
Measuring the performance of parallel
computers
• Benchmarks: programs that are used to
measure the performance.
– LINPACK benchmark: a measure of a system’s
floating point computing power
• Solving a dense N by N system of linear equations Ax=b
• Use to rank supercomputers in the top500 list.
Other common benchmarks
• Micro benchmarks suit
– Numerical computing
• LAPACK
• ScaLAPACK
– Memory bandwidth
• STREAM
• Kernel benchmarks
– NPB (NAS parallel benchmark)
– PARKBENCH
– SPEC
– Splash
Memory architectures
• Shared Memory
• Distributed Memory
• Hybrid Distributed-Shared Memory
Shared Memory
• Shared memory parallel computers vary widely, but generally have in
common the ability for all processors to access all memory as global
address space.

• Multiple processors can operate independently but share the same

memory resources.
• Changes in a memory location effected by one processor are visible to
all other processors.
• Shared memory machines can be divided into two main classes based
upon memory access times: UMA and NUMA.
Shared Memory: Pro and Con
• Advantages
– Global address space provides a user-friendly programming perspective to
memory
– Data sharing between tasks is both fast and uniform due to the proximity of
memory to CPUs
• Disadvantages:
– Primary disadvantage is the lack of scalability between memory and CPUs.
Adding more CPUs can geometrically increases traffic on the shared memory-
CPU path, and for cache coherent systems, geometrically increase traffic
associated with cache/memory management.
– Programmer responsibility for synchronization constructs that insure "correct"
access of global memory.
– Expense: it becomes increasingly difficult and expensive to design and produce
shared memory machines with ever increasing numbers of processors.
•
Distributed Memory
Like shared memory systems, distributed memory systems vary widely but share a
common characteristic. Distributed memory systems require a communication network
to connect inter-processor memory.
• Processors have their own local memory. Memory addresses in one processor do not
map to another processor, so there is no concept of global address space across all
processors.
• Because each processor has its own local memory, it operates independently. Changes it
makes to its local memory have no effect on the memory of other processors. Hence,
the concept of cache coherency does not apply.
• When a processor needs access to data in another processor, it is usually the task of the
programmer to explicitly define how and when data is communicated. Synchronization
between tasks is likewise the programmer's responsibility.
• The network "fabric" used for data transfer varies widely, though it can can be as simple
as Ethernet.
Distributed Memory: Pro and Con
• Advantages
– Memory is scalable with number of processors. Increase the number of
processors and the size of memory increases proportionately.
– Each processor can rapidly access its own memory without interference and
without the overhead incurred with trying to maintain cache coherency.
– Cost effectiveness: can use commodity, off-the-shelf processors and
networking.
• Disadvantages
– The programmer is responsible for many of the details associated with data
communication between processors.
– It may be difficult to map existing data structures, based on global memory, to
this memory organization.
– Non-uniform memory access (NUMA) times
Hybrid Distributed-Shared Memory
Summarizing a few of the key characteristics of shared and
distributed memory machines
Comparison of Shared and Distributed Memory Architectures

Architecture CC-UMA CC-NUMA Distributed

Examples SMPs Bull NovaScale Cray T3E

Sun Vexx SGI Origin Maspar
DEC/Compaq Sequent IBM SP2
SGI Challenge HP Exemplar IBM BlueGene
IBM POWER3 DEC/Compaq
IBM POWER4 (MCM)

Communications MPI MPI MPI

Threads Threads
OpenMP OpenMP
shmem shmem

Scalability to 10s of processors to 100s of processors to 1000s of processors

Draw Backs Memory-CPU bandwidth Memory-CPU bandwidth System administration

Non-uniform access Programming is hard to
times develop and maintain

Software Availability many 1000s ISVs many 1000s ISVs 100s ISVs
Hybrid Distributed-Shared Memory
• The largest and fastest computers in the world today employ both shared and
distributed memory architectures.

• The shared memory component is usually a cache coherent SMP machine.

Processors on a given SMP can address that machine's memory as global.
• The distributed memory component is the networking of multiple SMPs.
SMPs know only about their own memory - not the memory on another SMP.
Therefore, network communications are required to move data from one SMP
to another.
• Current trends seem to indicate that this type of memory architecture will
continue to prevail and increase at the high end of computing for the
foreseeable future.
• Advantages and Disadvantages: whatever is common to both shared and
distributed memory architectures.
Parallel Computers/Vector
Computers
• MIMD>SIMD>MISD
• Types
– Shared Memory Multiprocessors
– Message Passing Multicomputers
• Differences
– Memory Sharing
– IPC
Vector Processors(Array
Processors)
• SIMD
– Large Vector Input Processing
• Features
– Multiple Vector Pipelines
– Concurrently used under Firmware/Hardware
control
• Types
– Memory-to-Memory Architecture
– Register-to-Register Architecture
Parallel Computer Architecture
System Attributes
• Performances of a Computer System
– Machine Capability
• Better Hardware Technology
• Innovative Architectural features
• Efficient Resource Management
– Program Behaviour
• Application and Runtime
• Algorithm Design and DS
• Programming Language
• Compiler Technology
System Attributes to Performance
• Program Performance Metric
– Turn Around Time
– CPU Time
– Clock Rate and CPI
• Cycle Time
• Clock Rate
• Instruction Count
• CPI – Cycles Per Instruction
• Instruction Cycle
– IF, ID, OF, OD, EX
• Memory Cycle
– Time required to complete one memory reference
– It is k times the processor cycle
– k value depends on the Memory Technology,
Cache Memory Speed and CPU-Memory
interconnection mechanism
System Attributes
• The 5 Performance factors are influenced by 4
system attributes
– Instruction Set Architecture
– Compiler Technology
– CPU Implementation and Control
– Cache & Memory Hierarchy
System Attributes
• FLOPS
• Throughput
Implicit Vs Explicit Parallelism
Multiprocessors & Multicomputers
• Shared-Memory Multiprocessors
– UMA Model
– NUMA Model
– COMA Model
– CC-NUMA
• Distributed Memory Multicomputers
– NORMA
UMA Model
• Tightly Couple Systems
• Suitable for Time Sharing applications
• Symmetric MP vs Asymmetric MP
• MP vs AP
Summary
• Flynn’s classification
– SISD, SIMD, MIMD, MISD
• Modern classification
– Data parallelism
– function parallelism
• Instruction level, thread level, and process level
• Performance
– MIPS, MFLOPS
– Peak performance and sustained performance
References
• K. Hwang, "Advanced Computer Architecture :
Parallelism, Scalability, Programmability",
McGraw Hill, 1993.
• D. Sima, T. Fountain, P. Kacsuk, "Advanced
Computer Architectures : A Design Space
Approach", Addison Wesley, 1997.

Girlfriend Ki Help Se Uski Sisters or Apni Sisters Ko Choda
65% (139)
Girlfriend Ki Help Se Uski Sisters or Apni Sisters Ko Choda
603 pages
Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
Unit-1 ACA
No ratings yet
Unit-1 ACA
26 pages
Lecture-3 Parallel Computer Memory Architecture
No ratings yet
Lecture-3 Parallel Computer Memory Architecture
14 pages
Architecture1 1 (2012)
No ratings yet
Architecture1 1 (2012)
87 pages
PARALLEL PROGRAMMING Module 1
No ratings yet
PARALLEL PROGRAMMING Module 1
20 pages
Module 2
No ratings yet
Module 2
124 pages
Overview of Parallel Computing: Shawn T. Brown
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
46 pages
Parallel Distributed Computing
No ratings yet
Parallel Distributed Computing
64 pages
2 Parallel Computer Memory Architectures
No ratings yet
2 Parallel Computer Memory Architectures
26 pages
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
No ratings yet
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
22 pages
Slide02 Parallel Computers
No ratings yet
Slide02 Parallel Computers
44 pages
Unit Iv Parallelism
No ratings yet
Unit Iv Parallelism
80 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Architecture
No ratings yet
Architecture
67 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
51 pages
Pda 2
No ratings yet
Pda 2
105 pages
CS516: Parallelization of Programs: Overview of Parallel Architectures
No ratings yet
CS516: Parallelization of Programs: Overview of Parallel Architectures
43 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Multicore Architecture Basics
No ratings yet
Multicore Architecture Basics
19 pages
Parallel Computers
No ratings yet
Parallel Computers
39 pages
Unit 2.1
No ratings yet
Unit 2.1
18 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
Lect2 Classification
No ratings yet
Lect2 Classification
23 pages
CS0051 - M1-Parallel Computing Hardware
No ratings yet
CS0051 - M1-Parallel Computing Hardware
36 pages
Intro To Parallel Computing
No ratings yet
Intro To Parallel Computing
127 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
Parralel PerformanceMeasurement
No ratings yet
Parralel PerformanceMeasurement
23 pages
Advanced Computer Architecture Unit 1
No ratings yet
Advanced Computer Architecture Unit 1
23 pages
Parallel Computing Concepts Guide
No ratings yet
Parallel Computing Concepts Guide
32 pages
(Ebook PDF) Modeling The Dynamics of Life: Calculus and Probability For Life Scientists 3rd Edition Full
100% (1)
(Ebook PDF) Modeling The Dynamics of Life: Calculus and Probability For Life Scientists 3rd Edition Full
153 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
MIMD Architectures Explained
No ratings yet
MIMD Architectures Explained
12 pages
Unit 1
No ratings yet
Unit 1
21 pages
Modelling and Simulation of Synchronous Machine Transient Analysis
No ratings yet
Modelling and Simulation of Synchronous Machine Transient Analysis
10 pages
PP16 Lec4 Arch3
No ratings yet
PP16 Lec4 Arch3
23 pages
Masters of Russian Song (c1917) (Vol 2)
86% (7)
Masters of Russian Song (c1917) (Vol 2)
128 pages
Unit 4
No ratings yet
Unit 4
16 pages
Essay Structures & Phrases Guide
100% (1)
Essay Structures & Phrases Guide
16 pages
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
No ratings yet
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
19 pages
Parallel Memory Architectures
No ratings yet
Parallel Memory Architectures
6 pages
Flynn's Taxonomy & Parallel Models
No ratings yet
Flynn's Taxonomy & Parallel Models
27 pages
CC Unit 1.2
No ratings yet
CC Unit 1.2
39 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
SOA 27001 Controles Aplicados
No ratings yet
SOA 27001 Controles Aplicados
16 pages
Public Speaking Judging Rubric
No ratings yet
Public Speaking Judging Rubric
4 pages
(SMC), (SMP), (MPP) : Symmetric Multi-Computers Symmetric Multi-Processors
No ratings yet
(SMC), (SMP), (MPP) : Symmetric Multi-Computers Symmetric Multi-Processors
13 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
LEA 6 - CFLM 2 N-P-FaHCotP
No ratings yet
LEA 6 - CFLM 2 N-P-FaHCotP
128 pages
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
No ratings yet
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
17 pages
Focus2 2E Unit Test Vocabulary Grammar UoE Unit5 GroupB
100% (1)
Focus2 2E Unit Test Vocabulary Grammar UoE Unit5 GroupB
2 pages
Programação Paralela e Distribuída
No ratings yet
Programação Paralela e Distribuída
39 pages
Phonetics Booklet - Key
No ratings yet
Phonetics Booklet - Key
11 pages
Classification Based On Memory Access Architecture Shared Memory General Characteristics: General Characteristics
No ratings yet
Classification Based On Memory Access Architecture Shared Memory General Characteristics: General Characteristics
4 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
CS213 Parallel Processing Syllabus
No ratings yet
CS213 Parallel Processing Syllabus
26 pages
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
No ratings yet
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
43 pages
SISd
No ratings yet
SISd
17 pages
Shi'Ur Qomah
No ratings yet
Shi'Ur Qomah
2 pages
Multiprocessor vs Multicomputer Systems
No ratings yet
Multiprocessor vs Multicomputer Systems
27 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Multiprocessor Basics & Performance
No ratings yet
Multiprocessor Basics & Performance
52 pages
Parallel Computer Architecture Classification
No ratings yet
Parallel Computer Architecture Classification
23 pages
Outstanding Short Stories-TR
No ratings yet
Outstanding Short Stories-TR
5 pages
Christ Is Made A Sure Foundation
No ratings yet
Christ Is Made A Sure Foundation
1 page
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Corpus BasedSociolinguistics Partington
No ratings yet
Corpus BasedSociolinguistics Partington
7 pages
Tutorial Letter 302/4/2024: Presenting Assignment Answers and Referencing
No ratings yet
Tutorial Letter 302/4/2024: Presenting Assignment Answers and Referencing
46 pages
Romance and Chivalry in English Medieval Literature
No ratings yet
Romance and Chivalry in English Medieval Literature
12 pages
12 Ip
No ratings yet
12 Ip
4 pages
Lab 8 - Latex: 1. Objective - 2. Tutorial A. The Basic Layout of A Latex File
No ratings yet
Lab 8 - Latex: 1. Objective - 2. Tutorial A. The Basic Layout of A Latex File
12 pages
New01 Intro
No ratings yet
New01 Intro
11 pages
English For Tourism Luh Sri Kusuma Dewi
No ratings yet
English For Tourism Luh Sri Kusuma Dewi
6 pages
Form 2 School Based Computer Science Syllabus
No ratings yet
Form 2 School Based Computer Science Syllabus
5 pages
The Stolen Legacy Student's Name University Affiliation Course Number and Name Instructor Name Assignment Due Date
No ratings yet
The Stolen Legacy Student's Name University Affiliation Course Number and Name Instructor Name Assignment Due Date
6 pages
Capr-I 4115
No ratings yet
Capr-I 4115
84 pages
Process of Writing
No ratings yet
Process of Writing
5 pages
InGuard (Toll Fraud Guard) Application Installation Manual - 2 - 0
No ratings yet
InGuard (Toll Fraud Guard) Application Installation Manual - 2 - 0
23 pages
7.2 Algorithms
No ratings yet
7.2 Algorithms
4 pages
Academic Vocabulary Teaching Guide
No ratings yet
Academic Vocabulary Teaching Guide
4 pages
Gonds Art
No ratings yet
Gonds Art
5 pages
What Is The Twink-Handler Relationship I Asked A Bunch of Twinks and Their Handlers
No ratings yet
What Is The Twink-Handler Relationship I Asked A Bunch of Twinks and Their Handlers
1 page
Fyimca Business Mathematics 123 Theory Termwork 2
No ratings yet
Fyimca Business Mathematics 123 Theory Termwork 2
3 pages

U1-Theory of Parallelism

Uploaded by

U1-Theory of Parallelism

Uploaded by

Parallel computer architecture

• Flynn’s taxonomy (Michael Flynn, 1967) classifies computer

SISD SIMD MIMD MISD

ALU ALU ALU

• Superscalar: Pipelined + multiple issues.

• GPU: Cuda architecture

ALU ALU ALU

Instruction level Thread level Process level

Pipelined VLIWs Superscalar Shared

– MFLOPS: million floating point operations per second.

• Which is a better metric?

• Multiple processors can operate independently but share the same

Architecture CC-UMA CC-NUMA Distributed

Examples SMPs Bull NovaScale Cray T3E

Communications MPI MPI MPI

Scalability to 10s of processors to 100s of processors to 1000s of processors

Draw Backs Memory-CPU bandwidth Memory-CPU bandwidth System administration

• The shared memory component is usually a cache coherent SMP machine.

You might also like