0% found this document useful (0 votes)

15 views38 pages

03 (Parallel Software)

The document discusses parallel and distributed programming, focusing on parallel software design, performance, and the writing of parallel programs. It covers concepts such as shared and distributed memory, synchronization, communication, and the implications of nondeterminism and race conditions. Additionally, it addresses performance metrics like speedup and efficiency, along with Amdahl's Law regarding the limitations of parallelization.

Uploaded by

sophomorepieas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views38 pages

03 (Parallel Software)

Uploaded by

sophomorepieas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Parallel and Distributed

Programming
Dr. Muhammad Naveed Akhtar
Lecture – 03
Parallel Software

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 2

Roadmap
• Parallel software
• Input and output
• Performance
• Parallel program design
• Writing and running parallel programs
• Assumptions

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 3

Parallel Software

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 4

The burden is on software
• Hardware and compilers can keep up the pace needed.
• From now on…
• In shared memory programs: if (I’m thread process i)
• Start a single process and fork threads.
do this;
• Threads carry out tasks.
• In distributed memory programs:
else
• Start multiple processes. do that;
• Processes carry out tasks.

• A SPMD programs consists of a single executable that can behave as if it were multiple different
programs through the use of conditional branches.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 5

Writing Parallel Programs
• Divide the work among the processes/threads so that:
• each process/thread gets roughly the same amount of work
• communication is minimized.

• Arrange for the processes/threads to synchronize.

• Arrange for communication among processes/threads.

double x[n], y[n];

…
for (i = 0; i < n; i++)
x[i] += y[i];

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 6

Shared Memory
• Dynamic threads
• Master thread waits for work, forks new threads, and when threads are done, they terminate
• Efficient use of resources, but thread creation and termination is time consuming.

• Static threads
• Pool of threads created and are allocated work, but do not terminate until cleanup.
• Better performance, but potential waste of system resources.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 7

Nondeterminism
• When some information may not be known before the start of execution of a program.
• Scheduling nondeterministic task arises in loops and conditional branching.
• Scheduling nondeterministic programs can be achieved dynamically. However, dynamic scheduling
consumes time and resources which leads to overhead during program execution.

printf ( "Thread %d > my_val = %d\n", my_rank , my_x ) ;

my_val = Compute_val(my_rank);
x += my_val ;
Thread 0 > my_val = 7
Thread 1 > my_val = 19
Thread 1 > my_val = 19
Thread 0 > my_val = 7
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 8
Nondeterminism

Race Condition Critical Section

When two or more threads access the same • The overlapping portion of each process,
resource at the same time where the shared variables are being used.
• Mutually exclusive (One at a time)
• Mutual exclusion lock (Mutex, or Simply Lock)
Thread 1 Thread 2 Balance
Time

Withdraw $50 Withdraw $50 $125

my_val = Compute_val ( my_rank ) ;
Read Balance $125
Read Balance $125 Lock(&add_my_val_lock ) ;
Set Balance $75 x += my_val ;
Set Balance $75 Unlock(&add_my_val_lock ) ;

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 9

Busy – Waiting
• A thread repeatedly tests a condition, but, effectively, does no useful work until the condition has
the appropriate value

my_val = Compute_val ( my_rank ) ;

if ( my_rank == 1)
while ( ! ok_for_1 ) ; /* Busy−wait loop */
x += my_val ; /* Critical section */
if ( my_rank == 0)
ok_for_1 = true ; /* Let thread 1 update x */

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 10

Message – Passing

char message [ 100 ] ;

. . .
my_rank = Get_rank ( ) ;
if ( my_rank == 1) {
sprintf ( message , "Greetings from process 1" ) ;
Send ( message , MSG_CHAR , 100 , 0 ) ;
} else if ( my_rank == 0) {
Receive ( message , MSG_CHAR , 100 , 1 ) ;
printf ( "Process 0 > Received: %s\n" , message ) ;
}

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 11

Partitioned Global Address Space (PGAS)
• Explicitly-parallel programming model with SPMD parallelism
• Fixed at program start-up, typically 1 thread per processor
• Global address space model of memory
• Allows programmer to directly represent distributed data structures
• Address space is logically partitioned
• Local vs. remote memory (two-level hierarchy)
• Programmer control over performance critical decisions
• Data layout and communication
• Performance transparency and tunability are goals
• Initial implementation can use fine-grained shared memory
• Multiple PGAS languages: UPC (C), CAF (Fortran), Titanium (Java)

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 12

Global Address Space Eases Programming
• The languages share the global address space abstraction
• Shared memory is logically partitioned by processors
• Remote memory may stay remote: no automatic caching implied
• One-sided communication: reads/writes of shared variables
• Both individual and bulk memory copies

• Languages differ on details

• Some models have a separate private memory area
• Distributed array generality and how they are constructed

Thread - 0 Thread - 1 Thread - N

Global X[0] X[1] X[P] Shared
Address Space ptr: ptr: … … … … … ptr: Private

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 13

Partitioned Global Address Space Example

shared int n = ... ;

shared double x[n] , y[n] ;
private int i , my_first_element , my_last_element ;
my_first_element = ... ;
my_last_element = ... ;
/ * Initialize x and y */
...
for ( i = my_first_element ; i <= my_last_element ; i++)
x [i] += y [i] ;
Current Implementations of PGAS Languages
• A successful language/library must run everywhere
• UPC (Unified Parallel C)
• Commercial compilers available on Cray, SGI, HP machines
• Open source compiler from LBNL/UCB (source-to-source), gcc-based compiler
• CAF (Co-Array Fortran)
• Commercial compiler available on Cray machines
• Open source compiler available from Rice
• Titanium
• Open source compiler from UCB runs on most machines
• Common tools
• Open64 open source research compiler infrastructure
• ARMCI, GASNet for distributed memory implementations
• Pthreads, System V shared memory

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 15

Input and Output
• In distributed memory programs, only process 0 will access stdin. In shared memory programs, only
the master thread or thread 0 will access stdin.
• In both distributed memory and shared memory programs all the processes/threads can access
stdout and stderr.
• However, because of the indeterminacy of the order of output to stdout, in most cases only a single
process/thread will be used for all output to stdout other than debugging output.
• Debug output should always include the rank or id of the process/thread that’s generating the
output.
• Only a single process/thread will attempt to access any single file other than stdin, stdout, or
stderr. So, for example, each process/thread can open its own, private file for reading or writing,
but no two processes/threads will open the same file.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 16

Performance

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 17

Speedup
• Serial run-time = Tserial
• Number of cores = p
• Parallel run-time = Tparallel
Parallel Time (Ideal) Serial Program Perfect Parallelization Perfect Load Balancing
𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 100 100
𝑇𝑇𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 𝑆𝑆 =
25
= 4.0 𝑆𝑆 =
35
= 2.85
𝑝𝑝

Speedup of a parallel program

• Speedup = S 𝑆𝑆 =
100
= 2.5 𝑆𝑆 =
100
= 2.0
𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 40 50
𝑆𝑆 =
𝑇𝑇𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
Load Imbalance Load Imbalance & Sync
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) Close to real life problems 18
Efficiency of Parallel Program

𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑆𝑆 𝑇𝑇𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝐸𝐸 = = =
𝑝𝑝 𝑝𝑝 𝑝𝑝. 𝑇𝑇𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 19

Effect of Problem Size
𝑁𝑁
Items
2

𝑁𝑁 Items

𝑁𝑁 × 2 Items

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 20

Effect of Overhead
𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑇𝑇𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 =  Ideally Practically  𝑝𝑝. 𝑇𝑇𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 > 𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑝𝑝
𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑝𝑝. 𝑇𝑇𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑇𝑇𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = + 𝑇𝑇𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
𝑝𝑝

Amdahl’s Law
• Unless virtually all of a serial program is parallelized, the possible speedup is going to be very limited
— regardless of the number of cores available.

p processors
serial section parallelizable section 𝑟𝑟. 𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
one processor 1 − 𝑟𝑟 . 𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑟𝑟. 𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 1 − 𝑟𝑟 . 𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑝𝑝

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 21

Consequence of Amdahl’s Law
• For a given instance, adding additional Example
processors gives diminishing returns • Assume 90% of a serial program is perfectly parallel

• only relatively few processors can be • Tserial = 20 seconds

efficiently used • Runtime of parallelizable part is
• Way around: 𝑇𝑇𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝−𝑎𝑎 = 1 − 𝑟𝑟 . 𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 /𝑝𝑝 = 0.9 × 20/𝑝𝑝=18/p
• increase the problem size • Runtime of “un-parallelizable” part is
• sequential part tends to grow slower 𝑇𝑇𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝−𝑏𝑏 = 𝑟𝑟 . 𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 0.1 × 20=2
then the parallel part
• Overall parallel run-time is
• A system is scalable if efficiency can be 𝑇𝑇𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 𝑇𝑇𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝−𝑎𝑎 + 𝑇𝑇𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝−𝑏𝑏 = (18/p)+2
maintained by increasing problem size
• Speedup
𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 20
𝑆𝑆 = =
𝑇𝑇𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 (18/p)+2
𝑝𝑝 → ∞, 𝑆𝑆 → 10
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 22
Sources of Parallel Overheads
• Overhead of creating threads/processes
• Synchronization
• Load imbalance
• Communication
• Extra computation
• Memory access (for both sequential and parallel!)

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 23

Scalability
• In general, a problem is scalable if it can handle ever increasing problem sizes.
• If we increase the number of processes/threads and keep the efficiency fixed without increasing
problem size, the problem is strongly scalable.
• If we keep the efficiency fixed by increasing the problem size at the same rate as we increase the
number of processes/threads, the problem is weakly scalable.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 24

Taking Timings (How to measure time)
• What is time?
MPI_Wtime()
• Start to finish? omp_get_wtime()
• A program segment of interest?
• CPU time?
• Wall clock time?

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 25

Taking Timings (Distributed Memory Systems)

$ gcc -o tmeasure tmeasure.c -lm

$ ./tmeasure
clock resolution: 1000000
res: 1.000000e+00
start/stop: 0.000000e+00,8.730000e+00
Time: 8.730000e+00

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 26

3 Different Type of Times

• Wall Time
• Time span a “clock on the wall” would measure, Time elapsed between start and completion of the program.
• This is usually the time to be minimized.

• User Time
• The actual runtime used by the program.
• 1 CPU: User time << the wall time, Multiple CPUs: User time > the wall time
• Program has to wait a lot (for computation time allocation, for data from the RAM or from the hard-disk).
• These are indications for necessary optimizations.

• System Time
• Time used not by the program itself, but by the operating system, e.g. for allocating memory or hard disk access.
• System time should stay low.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 27

Measuring Program Runtime
• LINUX command time: time ls -l

• For the performance analysis, we want to know the runtime of individual parts of a program.
• MPI & OpenMP have their own, platform independent functions for time measurement.
• MPI_Wtime() & omp_get_wtime() return the wall time in secs, the difference between the results of two such
function calls yields the runtime elapsed between the two function calls.

• Advanced method of performance analysis: profiling (various tools: gprof, Jumpshot, PMPI, Vampir)

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 28

Using gprof (GNU Profiler) program
• Compile my_program.c
gcc -pg my_program.c
./a.out
gprof a.out

 Assignment
 my_program.c (will be available on canvas)
 What this program doing?
 Translate the output in words.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 29

Profiling Tools for MPI

Jumpshot PMPI Vampir

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 30

Parallel Program Design

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 31

Foster’s methodology
Partitioning
• Divide the computation to be performed and the data operated on by the computation into small tasks.
• Identifying tasks that can be executed in parallel
Communication
• Determine what communication needs to be carried out among the tasks identified in the previous step
Agglomeration or aggregation
• Combine tasks and communications identified in the first step into larger tasks.
Mapping
• Assign the composite tasks identified in the previous step to processes/threads
• Each process/thread gets roughly the same amount of work

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 32

Example - Histogram
1.3, 2.9, • Input
0.4, 0.3, • The number of measurements: data_count

1.3, 4.4, • An array of length data_count floats: data

• The minimum value for the bin containing the smallest values: min_meas
1.7, 0.4,
• The maximum value for the bin containing the largest values: max_meas
3.2, 0.3,
• The number of bins: bin_count
4.9, 2.4,
• Output
3.1, 4.4, • bin_maxes : an array of bin_count floats
3.9, 0.4, • bin_counts : an array of bin_count ints
4.2, 4.5,
4.9, 0.9

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 33

First two stages of Foster’s Methodology

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 34

Alternative definition of tasks and communication

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 35

Adding the local arrays

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 36

Concluding Remarks
• Serial systems • Input and Output
• The standard model of computer hardware • One process / thread can access stdin, and all
has been the von Neumann architecture. processes can access stdout and stderr.

• Parallel hardware • However, except for debug output we usually have

a single process / thread accessing stdout.
• Flynn’s taxonomy.
• Performance
• Parallel software
• Speedup, Efficiency
• We focus on software for homogeneous MIMD
systems, consisting of a single program that • Amdahl’s law, Scalability
obtains parallelism by branching. • Parallel Program Design
• SPMD programs. • Foster’s methodology

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 37

Questions and comments?

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 38

Parallel Computing-Module2 Notes
No ratings yet
Parallel Computing-Module2 Notes
10 pages
Bcs702 Parallel Computing Module 1
100% (1)
Bcs702 Parallel Computing Module 1
35 pages
Parallel Computing Notes
No ratings yet
Parallel Computing Notes
27 pages
BCS702 Module 2 Textbook
No ratings yet
BCS702 Module 2 Textbook
13 pages
Q1 WS TLE 8 Lesson 1 Week 1
No ratings yet
Q1 WS TLE 8 Lesson 1 Week 1
6 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
Gregory R. Andrews-Foundations of Multithreaded, Parallel, and Distributed Programming-Addison-Wesley (1999)
100% (4)
Gregory R. Andrews-Foundations of Multithreaded, Parallel, and Distributed Programming-Addison-Wesley (1999)
682 pages
Phishing Website Detection DOCUMENTATION
0% (2)
Phishing Website Detection DOCUMENTATION
80 pages
Lecture-4 Parallel Programming Model
No ratings yet
Lecture-4 Parallel Programming Model
14 pages
High Performance Computing
100% (2)
High Performance Computing
164 pages
Apt05 2024S2
No ratings yet
Apt05 2024S2
23 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Unit 3 Parallel Programming: Structure Nos
No ratings yet
Unit 3 Parallel Programming: Structure Nos
26 pages
02 - A (Parallel Hardware)
No ratings yet
02 - A (Parallel Hardware)
47 pages
01 (Why Parallel Computing)
No ratings yet
01 (Why Parallel Computing)
24 pages
Parallel & Distributed Computing Course Overview
No ratings yet
Parallel & Distributed Computing Course Overview
47 pages
PC Module2
No ratings yet
PC Module2
10 pages
Parallel Programming Models
No ratings yet
Parallel Programming Models
25 pages
3.introduction To Parallelism
No ratings yet
3.introduction To Parallelism
64 pages
Intro Parallel Programming 2015
No ratings yet
Intro Parallel Programming 2015
38 pages
Parallel Computing Course Guide
No ratings yet
Parallel Computing Course Guide
2 pages
Casio wk1800 Keyboard SM
No ratings yet
Casio wk1800 Keyboard SM
44 pages
02 - B (Parallel Hardware)
No ratings yet
02 - B (Parallel Hardware)
40 pages
002 IntroHPC
No ratings yet
002 IntroHPC
33 pages
PDC Lecture 05
No ratings yet
PDC Lecture 05
48 pages
Con Currency
No ratings yet
Con Currency
99 pages
2 ParallelArchExec
No ratings yet
2 ParallelArchExec
46 pages
PDC Lecture 01
No ratings yet
PDC Lecture 01
36 pages
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
No ratings yet
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
22 pages
Concurrency
No ratings yet
Concurrency
99 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Unit 3
No ratings yet
Unit 3
49 pages
Heterodyne Principle
No ratings yet
Heterodyne Principle
8 pages
08 Systems Programming-Concurrent Programming
No ratings yet
08 Systems Programming-Concurrent Programming
61 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
High Performance Computing (HPC) - Lec2
No ratings yet
High Performance Computing (HPC) - Lec2
53 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
34 pages
01 Introduction
No ratings yet
01 Introduction
41 pages
Week1 Parallel and Distributed Computing
No ratings yet
Week1 Parallel and Distributed Computing
55 pages
HPC Module 4
No ratings yet
HPC Module 4
18 pages
Lecture 4
No ratings yet
Lecture 4
20 pages
Whitepaper Imsl Increase Performance Parallel Programming Numerical Libraries
No ratings yet
Whitepaper Imsl Increase Performance Parallel Programming Numerical Libraries
8 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Parallel Computing Essentials
No ratings yet
Parallel Computing Essentials
32 pages
OpenACC 1
No ratings yet
OpenACC 1
44 pages
Class 9pdf 32
No ratings yet
Class 9pdf 32
55 pages
Introduction To Parallel and Distributed Programming
No ratings yet
Introduction To Parallel and Distributed Programming
6 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages
L12-Principles of Message Passing1
No ratings yet
L12-Principles of Message Passing1
10 pages
Data Processing Methods Explained
No ratings yet
Data Processing Methods Explained
20 pages
Concurrency: CS2403 Programming Languages
No ratings yet
Concurrency: CS2403 Programming Languages
44 pages
HDMI+LVDS 选型表
No ratings yet
HDMI+LVDS 选型表
2,346 pages
VLSI - Booth Multiplier
No ratings yet
VLSI - Booth Multiplier
8 pages
xHCI Rev1 2b
No ratings yet
xHCI Rev1 2b
595 pages
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
No ratings yet
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
58 pages
CS526 3 Design of Parallel Programs
No ratings yet
CS526 3 Design of Parallel Programs
83 pages
Computação Paralela
No ratings yet
Computação Paralela
18 pages
Module 5
No ratings yet
Module 5
40 pages
Parallel Programming: Aaron Bloomfield CS 415 Fall 2005
No ratings yet
Parallel Programming: Aaron Bloomfield CS 415 Fall 2005
24 pages
Parallel Computing
No ratings yet
Parallel Computing
24 pages
CBEC LAN v9 0
No ratings yet
CBEC LAN v9 0
35 pages
Lecture1 Introduction PDF
No ratings yet
Lecture1 Introduction PDF
43 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
Audio and Display Settings Guide
No ratings yet
Audio and Display Settings Guide
9 pages
IT105 Midterm Lecture Part1
No ratings yet
IT105 Midterm Lecture Part1
5 pages
Low Power J-Fet Quad Operational Amplifiers: TL064 TL064A - TL064B
No ratings yet
Low Power J-Fet Quad Operational Amplifiers: TL064 TL064A - TL064B
11 pages
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
No ratings yet
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
18 pages
Line 6 Basspodxt SM
No ratings yet
Line 6 Basspodxt SM
15 pages
02 - Introduction To Concurrent Systems PDF
No ratings yet
02 - Introduction To Concurrent Systems PDF
31 pages
Asus GTX780 ROG 3GD C2083P r1.00
No ratings yet
Asus GTX780 ROG 3GD C2083P r1.00
40 pages
Cruxminer Setup Guide V0.1.0
No ratings yet
Cruxminer Setup Guide V0.1.0
5 pages
Grade 8 Computer Networks
No ratings yet
Grade 8 Computer Networks
9 pages
6
No ratings yet
6
3 pages
ASRock - B85 Pro4
No ratings yet
ASRock - B85 Pro4
3 pages
01 RIO PL SQL Basic Blocks
No ratings yet
01 RIO PL SQL Basic Blocks
47 pages
VanRoy-Concepts, Techniques and Models in Computer Programming
No ratings yet
VanRoy-Concepts, Techniques and Models in Computer Programming
44 pages
Đầu cân VT500
No ratings yet
Đầu cân VT500
3 pages
Uta16 Report
No ratings yet
Uta16 Report
13 pages
Drive-Based Synchronism: SINAMICS S120 With DCB Extension
No ratings yet
Drive-Based Synchronism: SINAMICS S120 With DCB Extension
11 pages
Encoder - Van Toc - Gia Toc
No ratings yet
Encoder - Van Toc - Gia Toc
5 pages
Azure CLI Commands
No ratings yet
Azure CLI Commands
2 pages
C5517H - Datasheet
No ratings yet
C5517H - Datasheet
2 pages
Assignment 3 Case Study 1
No ratings yet
Assignment 3 Case Study 1
2 pages
Java Vlab Assignment6
No ratings yet
Java Vlab Assignment6
2 pages
SAP ABAP Consultant Profile
No ratings yet
SAP ABAP Consultant Profile
1 page
Mansi Kadam PC Lab Assignment 1
No ratings yet
Mansi Kadam PC Lab Assignment 1
4 pages

03 (Parallel Software)

Uploaded by

03 (Parallel Software)

Uploaded by

Parallel and Distributed

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 2

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 3

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 4

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 5

• Arrange for the processes/threads to synchronize.

double x[n], y[n];

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 6

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 7

printf ( "Thread %d > my_val = %d\n", my_rank , my_x ) ;

Race Condition Critical Section

Withdraw $50 Withdraw $50 $125

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 9

my_val = Compute_val ( my_rank ) ;

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 10

char message [ 100 ] ;

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 11

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 12

• Languages differ on details

Thread - 0 Thread - 1 Thread - N

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 13

shared int n = ... ;

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 15

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 16

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 17

Speedup of a parallel program

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 19

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 20

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 21

• only relatively few processors can be • Tserial = 20 seconds

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 23

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 24

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 25

$ gcc -o tmeasure tmeasure.c -lm

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 26

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 27

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 28

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 29

Jumpshot PMPI Vampir

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 30

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 31

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 32

1.3, 4.4, • An array of length data_count floats: data

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 33

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 34

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 35

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 36

• Parallel hardware • However, except for debug output we usually have

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 37

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 38

You might also like