0% found this document useful (0 votes)

10 views43 pages

CS516: Parallelization of Programs: Overview of Parallel Architectures

Uploaded by

chirag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views43 pages

CS516: Parallelization of Programs: Overview of Parallel Architectures

Uploaded by

chirag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

CS516: Parallelization of Programs

Overview of Parallel Architectures

Vishwesh Jatala
Assistant Professor
Department of CSE
Indian Institute of Technology Bhilai
[email protected]

2023-24 W
1
Recap: Why Parallel Architectures?
• Moore’s Law: The number of transistors on a IC doubles about every two years

2
Recap: Moore’s Law Effect

3
Processor Architecture RoadMap

4
Course Outline
■ Introduction
■ Overview of Parallel Architectures
■ Performance
■ Parallel Programming
• GPUs and CUDA programming
■ Case studies
■ Extracting Parallelism from Sequential Programs Automatically

5
Flynn’s Taxonomy
• Flynn’s classification of computer architecture

6
SISD: Single Instruction, Single Data
• The von Neumann architecture

• Implements a universal Turing machine

• Conforms to serial algorithmic analysis

From http://arstechnica.com/paedia/c/ cpu/part-1/cpu1-1.html

7
SIMD: Single Instruction, Multiple Data
• Single control stream

• All processors operating in lock step

• Fine-grained parallelism

8
SIMD: Single Instruction, Multiple Data

• Example: GPUs

From http://arstechnica.com/paedia/c/c pu/part-1/cpu1-1.html

9
MIMD: Multiple Instructions, Multiple Data
• Most the machines that are prevalent

• Multi-core, SMP, Clusters, NUMA machines, etc.

10
Rest of the today’s lecture…
• Flynn’s classification of computer architecture

11
Flynn’s Taxonomy
• Flynn’s classification of computer architecture

12
MIMD: Shared Memory Multiprocessors
• Tightly coupled multiprocessors
• Shared global memory address space
• Traditional multiprocessing: symmetric multiprocessing (SMP)
• Existing multi-core processors, multithreaded processors
• Programming model similar to uniprocessors (i.e., multitasking uniprocessor) except
• Operations on shared data require synchronization

13
Interconnection Schemes for SMP

14
SMP Architectures

15
UMA: Uniform Memory Access
• All processors have the same uncontended latency to memory
• Symmetric multiprocessing (SMP) ~ UMA with bus interconnect

16
UMA: Uniform Memory Access
+ Data placement unimportant/less important (easier to optimize code and make use of available
memory space)
- Scaling the system increases all latencies
- Contention could restrict bandwidth and increase latency

17
How to Scale Shared Memory Machines?
• Two general approaches

• Maintain UMA
• Provide a scalable interconnect to memory
• Scaling system increases memory latency

• Interconnect complete processors with local memory

• NUMA (Non-uniform memory access)
• Local memory faster than remote memory
• Still needs a scalable interconnect for accessing remote memory

18
NUMA: Non Uniform Memory Access
• Shared memory as local versus remote memory
+ Low latency to local memory
- Much higher latency to remote memories
+ Bandwidth to local memory may be higher
- Performance very sensitive to data placement

19
MIMD: Message Passing Architectures
• Loosely coupled multiprocessors
• No shared global memory address space
• Multicomputer network
• Network-based multiprocessors
• Usually programmed via message passing
• Explicit calls (send, receive) for communication

20
MIMD: Message Passing Architectures

21
Historical Evolution: 1960s & 70s

• Early MPs
• Mainframes
• Small number of processors
• crossbar interconnect
• UMA

22
Historical Evolution: 1980s

• Bus-Based MPs
• enabler: processor-on-a-board
• economical scaling
• precursor of today’s SMPs
• UMA

23
Historical Evolution: Late 80s, mid 90s
• Large Scale MPs (Massively Parallel
Processors)
• multi-dimensional interconnects
• each node a computer (proc + cache
+ memory)
• NUMA
• still used for “supercomputing”

24
Flynn’s Taxonomy
• Flynn’s classification of computer architecture

25
SIMD: Single Instruction, Multiple Data

• Example: GPUs

From http://arstechnica.com/paedia/c/c pu/part-1/cpu1-1.html

26
Data Parallel Programming Model
• Programming Model
• Operations are performed on each element of a large (regular) data
structure (array, vector, matrix)

• Simple example (A, B and C are vectors)

C = (A * B)
• The operations can be executed in sequential or parallel steps
• Language supports array assignment

27
On Sequential Hardwares

28
On Data Parallel Hardwares

29
Data Parallel Architectures
• Early architectures directly mirrored programming model

• Single control processor (broadcast each instruction to an array/grid of

processing elements)

• Examples: Connection Machine, MPP (Massively Parallel Processor)

30
Data Parallel Architectures
• Later data parallel architectures
• Higher integration → SIMD units on chip along with caches
• More generic → multiple cooperating multiprocessors (GPUs)
• Specialized hardware support for global synchronization

31
SIMD: Graphics Processing Units
• The early GPU designs
• Specialized for graphics processing only
• Exhibit SIMD execution
• Less programmable
• NVIDIA GeForce 256

• In 2007, fully programmable GPUs

• CUDA released

32
Single-core CPU vs Multi-core vs GPU

33
Single-core CPU vs Multi-core vs GPU

34
NVIDIA V100 GPU

https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
35
Specifications

36
CPUs vs GPUs

Chip to chip comparison of peak memory bandwidth in GB/s and peak double precision
gigaflops for GPUs and CPUs since 2008.

https://www.nextplatform.com/2019/07/10/a-decade-of-accelerated-computing-augurs-well-for-gpus
37
GPU Applications

38
Specifications

39
Multi-GPU Systems

https://www.azken.com/images/dgx1_images/dgx1-system-architecture-whitepaper1.pdf

40
Summary
• Parallel architectures are inevitable

• Different architectures are evolved

• Flynn’s taxonomy:

• SISD

• MISD

• MIMD

• SIMD

41
References
• David Culler, Jaswinder Pal Singh, and Anoop Gupta. 1998. Parallel Computer
Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers Inc.,
San Francisco, CA, USA

• https://safari.ethz.ch/architecture/fall2020/doku.php?id=schedule

• https://www.cse.iitd.ac.in/~soham/COL380/page.html

• https://s3.wp.wsu.edu/uploads/sites/1122/2017/05/6-9-2017-slides-vFinal.pptx

• https://ebhor.com/full-form-of-cpu/

• Miscellaneous resources on internet

42
Thank You

Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
Paralelismo 2024
No ratings yet
Paralelismo 2024
30 pages
Unit IV CA
No ratings yet
Unit IV CA
73 pages
Slide02 Parallel Computers
No ratings yet
Slide02 Parallel Computers
44 pages
Module 2
No ratings yet
Module 2
124 pages
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
No ratings yet
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
43 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
Architecture
No ratings yet
Architecture
67 pages
Flynn's Taxonomy & Parallel Models
No ratings yet
Flynn's Taxonomy & Parallel Models
27 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Parallel Architectures Parallel Architectures: Ever Faster
No ratings yet
Parallel Architectures Parallel Architectures: Ever Faster
11 pages
CS213 Parallel Processing Syllabus
No ratings yet
CS213 Parallel Processing Syllabus
26 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Aca Unit 1.1
No ratings yet
Aca Unit 1.1
20 pages
Lecture ParallelArchTLP-DLP
No ratings yet
Lecture ParallelArchTLP-DLP
52 pages
Lec 3
No ratings yet
Lec 3
48 pages
Unit 1
No ratings yet
Unit 1
21 pages
Architecture of Parallel Computing
No ratings yet
Architecture of Parallel Computing
6 pages
Organization of Multiprocessor Systems
No ratings yet
Organization of Multiprocessor Systems
87 pages
Week 4a - Computer Architecture Fundamentals - Part 1
No ratings yet
Week 4a - Computer Architecture Fundamentals - Part 1
45 pages
Lecture 1: Introduction: Graphics Processing Units (Gpus) : Architecture and Programming
No ratings yet
Lecture 1: Introduction: Graphics Processing Units (Gpus) : Architecture and Programming
33 pages
Ca - Unit 4
No ratings yet
Ca - Unit 4
77 pages
06 Flynn-S Classification
No ratings yet
06 Flynn-S Classification
31 pages
Ceg4131 Models
No ratings yet
Ceg4131 Models
27 pages
Lect2 Classification
No ratings yet
Lect2 Classification
23 pages
MIMD Architectures Explained
No ratings yet
MIMD Architectures Explained
12 pages
Parallel Computing for Tech Students
No ratings yet
Parallel Computing for Tech Students
14 pages
Flynn's Classification
No ratings yet
Flynn's Classification
46 pages
Advanced Computer Architecture: Presented By, Krishna
No ratings yet
Advanced Computer Architecture: Presented By, Krishna
35 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
Lecture13 - Full IS1500
No ratings yet
Lecture13 - Full IS1500
34 pages
COA U5 PPT Full
No ratings yet
COA U5 PPT Full
43 pages
Computer Design Paper Ali
No ratings yet
Computer Design Paper Ali
5 pages
Parallel Computer Architecture Classification
No ratings yet
Parallel Computer Architecture Classification
23 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
Onur Digitaldesign 2020 Lecture20 Gpu Beforelecture
No ratings yet
Onur Digitaldesign 2020 Lecture20 Gpu Beforelecture
73 pages
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
No ratings yet
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
48 pages
Ch12 Parallel Proc3-Aula
No ratings yet
Ch12 Parallel Proc3-Aula
35 pages
Parallel Processors: Session 2
No ratings yet
Parallel Processors: Session 2
32 pages
Parallel Programming FDP
No ratings yet
Parallel Programming FDP
43 pages
Advanced Computer Architecture Unit 1
No ratings yet
Advanced Computer Architecture Unit 1
23 pages
Parralel PerformanceMeasurement
No ratings yet
Parralel PerformanceMeasurement
23 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
51 pages
01 Introduction
No ratings yet
01 Introduction
51 pages
Cs8083 MCP Unit I Notes
No ratings yet
Cs8083 MCP Unit I Notes
31 pages
UNIT1
No ratings yet
UNIT1
11 pages
Parallel Processing Essentials
No ratings yet
Parallel Processing Essentials
49 pages
Parallel Computer Architecture Basics
No ratings yet
Parallel Computer Architecture Basics
268 pages
Introduction To Parallel Processing Architecture
No ratings yet
Introduction To Parallel Processing Architecture
31 pages
Introduction About ACA Syllabus
No ratings yet
Introduction About ACA Syllabus
18 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
Module 1
No ratings yet
Module 1
30 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
Pny Linecard SSD
No ratings yet
Pny Linecard SSD
4 pages
Types of Computer
No ratings yet
Types of Computer
4 pages
Strategic Directions of The Apple Computer Corporation
100% (4)
Strategic Directions of The Apple Computer Corporation
15 pages
Application Note 2420 1-Wire Communication With A Microchip Picmicro Microcontroller
No ratings yet
Application Note 2420 1-Wire Communication With A Microchip Picmicro Microcontroller
10 pages
Input/Output Organization in Computer Organisation and Architecture
100% (1)
Input/Output Organization in Computer Organisation and Architecture
99 pages
2150707
No ratings yet
2150707
2 pages
1997-03 The Computer Paper - BC Edition
No ratings yet
1997-03 The Computer Paper - BC Edition
120 pages
EEE 15EE63C U1 S1 Sy
No ratings yet
EEE 15EE63C U1 S1 Sy
4 pages
Restore Fanuc 0 CNC Memory Guide
No ratings yet
Restore Fanuc 0 CNC Memory Guide
4 pages
Hikvision Vertical NVR Features
No ratings yet
Hikvision Vertical NVR Features
1 page
Hardware Tamil - Final
No ratings yet
Hardware Tamil - Final
106 pages
CISC Processors: Features & Challenges
No ratings yet
CISC Processors: Features & Challenges
3 pages
Midterm Last2 1
No ratings yet
Midterm Last2 1
4 pages
Xerox - Supported - Products
No ratings yet
Xerox - Supported - Products
5 pages
Unit 1 Typography Notes
No ratings yet
Unit 1 Typography Notes
22 pages
Quanta - X21 SCH For ERD Chocolate - AMD R1a HP Pavilion 15 DAX21MB6D0
No ratings yet
Quanta - X21 SCH For ERD Chocolate - AMD R1a HP Pavilion 15 DAX21MB6D0
43 pages
Updated Chapter One - Introduction
No ratings yet
Updated Chapter One - Introduction
24 pages
SQLSaturday - Understanting Memory Pressure PDF
No ratings yet
SQLSaturday - Understanting Memory Pressure PDF
65 pages
SBC Programming for Enthusiasts
No ratings yet
SBC Programming for Enthusiasts
21 pages
Epson L3250 Brochure PDF
No ratings yet
Epson L3250 Brochure PDF
2 pages
COMSATS University, Islamabad: Islamabad Campus Department of Computer Science
No ratings yet
COMSATS University, Islamabad: Islamabad Campus Department of Computer Science
2 pages
Micro-IoT AI PU DU User Manual
No ratings yet
Micro-IoT AI PU DU User Manual
13 pages
Microprocessor Timing & Execution
No ratings yet
Microprocessor Timing & Execution
4 pages
Computer GK MCQs Top 300 (Part 1) ?
No ratings yet
Computer GK MCQs Top 300 (Part 1) ?
31 pages
The Various Ways of Programming and Embedding Firmware Into An ARM Cortex-M3 Microcontroller Based Hardware
No ratings yet
The Various Ways of Programming and Embedding Firmware Into An ARM Cortex-M3 Microcontroller Based Hardware
6 pages
OptiPlex 7060 Setup & Specs Guide
No ratings yet
OptiPlex 7060 Setup & Specs Guide
32 pages
BTVTE Lesson: Computer Basics
No ratings yet
BTVTE Lesson: Computer Basics
10 pages
Coa 2nd Lesson
No ratings yet
Coa 2nd Lesson
17 pages
Z 84 C 0006
No ratings yet
Z 84 C 0006
2 pages
UA0054A
No ratings yet
UA0054A
1 page

CS516: Parallelization of Programs: Overview of Parallel Architectures

Uploaded by

CS516: Parallelization of Programs: Overview of Parallel Architectures

Uploaded by

CS516: Parallelization of Programs

Overview of Parallel Architectures

• Implements a universal Turing machine

• Conforms to serial algorithmic analysis

From http://arstechnica.com/paedia/c/ cpu/part-1/cpu1-1.html

• All processors operating in lock step

From http://arstechnica.com/paedia/c/c pu/part-1/cpu1-1.html

• Multi-core, SMP, Clusters, NUMA machines, etc.

• Interconnect complete processors with local memory

From http://arstechnica.com/paedia/c/c pu/part-1/cpu1-1.html

• Simple example (A, B and C are vectors)

• Single control processor (broadcast each instruction to an array/grid of

• Examples: Connection Machine, MPP (Massively Parallel Processor)

• In 2007, fully programmable GPUs

• Different architectures are evolved

• Miscellaneous resources on internet

You might also like