Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
31 views20 pages

PARALLEL PROGRAMMING Module 1

The document outlines a course on parallel programming, covering objectives such as exploring the need for parallel programming and demonstrating the application of MPI and OpenMP libraries. It includes an introduction to parallel programming concepts, classifications of parallel computers (SIMD and MIMD systems), and details on shared-memory and distributed-memory systems. Additionally, it discusses interconnection networks and their impact on system performance.

Uploaded by

shantesh351
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views20 pages

PARALLEL PROGRAMMING Module 1

The document outlines a course on parallel programming, covering objectives such as exploring the need for parallel programming and demonstrating the application of MPI and OpenMP libraries. It includes an introduction to parallel programming concepts, classifications of parallel computers (SIMD and MIMD systems), and details on shared-memory and distributed-memory systems. Additionally, it discusses interconnection networks and their impact on system performance.

Uploaded by

shantesh351
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

PARALLEL

PROGRAMMING
Course Code: BDS701

• Course objectives:

• Explore the need for parallel programming

• Explain how to parallelize on MIMD systems

• To demonstrate how to apply MPI library and parallelize the suitable programs

• To demonstrate how to apply OpenMP pragma and directives to parallelize the


suitable programs

• To demonstrate how to design CUDA program


Module 1

Introduction to parallel programming, Parallel hardware and parallel


software – Classifications of parallel computers, SIMD systems, MIMD
systems, Interconnection networks, Cache coherence, Shared-memory vs.
distributed-memory, Coordinating the processes/threads, Shared-memory,
Distributed-memory.
Introduction to parallel programming
• Parallel programming is the execution of multiple instructions
simultaneously to solve a problem faster.

• It increases computational speed and efficiency by utilizing multiple


processing units.

• Important in modern computing due to multi-core processors and high-


performance computing needs.
Parallel hardware and Parallel
Software
1 Parallel hardware refers to computer systems with multiple processing
units that work together to perform tasks simultaneously.

 It enables faster execution of complex computations by distributing the


workload across processors.

2 Parallel software is designed to execute multiple tasks simultaneously


using multiple processors or cores.

 It complements parallel hardware to achieve high performance, scalability,


and efficient resource use.

 Requires special programming models, libraries, and design techniques.


Classification of Parallel Computers
Parallel computers can be classified in two simple ways:

1. Flynn’s Taxonomy
- Classifies systems based on instruction and data streams.
- Two common types:
• SIMD: Single instruction, multiple data.
• MIMD: Multiple instructions, multiple data.
- Helps understand how tasks are handled in parallel.

2. Memory Access Model


Classifies based on how memory is accessed by processors.
- Two main types:
• Shared Memory: All cores use the same memory.
• Distributed Memory: Each core has its own memory.
- Communication between cores differs in each type.
SIMD SYSTEM
Single Instruction, Multiple Data (SIMD) systems are a type of parallel system.

1. SIMD systems apply the same instruction simultaneously to multiple data streams.

2. Conceptually, a SIMD system has one control unit and multiple datapaths.

3. The control unit broadcasts an instruction to all datapaths, each of which either executes
the instruction on its data or remains idle.

4. For example, in vector addition, SIMD can add elements of two arrays, x and y, element-
wise in parallel.

Consider the loop: for (i = 0; i < n; i++) x[i] += y[i];


5. If the SIMD system has n datapaths, each datapath i can load x[i] and y[i],
perform the addition x[i] += y[i], and store the result back in x[i].

6. If the system has m datapaths where m < n, the additions are executed in blocks of
m elements at a time.
For example, if m = 4 and n = 15, the system processes elements in groups: 0–3, 4–7,
8–11, and 12–14.

7. In the last group (elements 12–14), only three data paths are used, so one data path
remains idle. The requirement that all data paths execute the same instruction or stay
idle can reduce SIMD performance.

For instance, if we want to add only when y[i] is positive: for (i = 0; i < n; i++)
if (y[i] > 0.0) x[i] += y[i];
some data paths may be idle depending on the condition, leading to inefficiency.
MIMD SYSTEMS

MIMD (Multiple Instruction, Multiple Data) systems run multiple


instruction streams on multiple data streams.

• Each processor/core operates independently with its own control unit


and datapath.

• Processors are asynchronous and operate at their own pace.

• Useful for task parallelism and complex computing systems


Types of MIMD Systems
1 Shared-memory systems:
- Processors access a common memory.
- Implicit communication using shared data.

2 Distributed-memory systems:
- Each processor has private memory.
- Communicate via messages/functions.
SHARED-MEMORY SYSTEM
Shared-memory systems

 The most widely available shared-memory systems use one or more multicore
processors.
 There is one large memory unit all cpu can access.
 Processor are connected to memory via an interconnect(like bus/network switch)
 All processor share the same address space ,meaning any cpu can access any
memory location directly
Uniform Memory Access (UMA)
In systems where all cores access memory with equal latency, the memory
access time remains uniform regardless of which core accesses which
memory location
Non-Uniform Memory Access (NUMA)
when each core has faster access to its own local memory block and slower
access to other cores' memory, the system is referred to as a Non-Uniform
Memory Access (NUMA) system.
DISTRIBUTED-MEMORY SYSTEM
DISTRIBUTED-MEMORY SYSTEM

• Each cpu has its own private memory


• processor cannot access each others memory directly
• To share dataprocessor send messages through an interconnect(network)
• Most common type today: clusters.
• Built from multiple standard computers connected via networks (e.g.,
Ethernet).
• Each cluster node is usually a shared-memory system with multicore
processors.
• These are called hybrid systems (shared memory within nodes, distributed
memory between nodes).
• Grids connect computers over large distances (geographically dispersed).
• Grids can use different hardware across nodes and act as one distributed-
memory system.
INTERCONNECTION NETWORKS

• Interconnect links processors and memory.


• Speed of interconnect greatly affects system performance.
• A slow interconnect can bottleneck parallel programs.
• Shared- and distributed-memory systems use different interconnect types.
Shared-Memory Interconnects

•Earlier systems used a bus to connect processors and memory.


•Bus is simple and cost-effective, but all devices share the same lines.
•More processors → contention increases → performance drops.
•Modern systems use switched interconnects instead of buses.

Switched Interconnects (Crossbar)


•Use switches for efficient, organized communication.
•Crossbar connects processors and memory modules via bidirectional links.
•Switches can be configured for different data paths.
•If memory modules ≥ processors, conflicts only occur when two processors
access the same module.
•Allow simultaneous communication between multiple devices.
•Faster than buses, but more expensive due to cost of switches and links.
Shared-Memory Interconnects

(a): A crossbar switch connecting four processors (Pi) and four memory modules (Mj)

(b): Configuration of internal switches in a crossbar


(c): Simultaneous memory accesses by the processors.

You might also like