0% found this document useful (0 votes)

38 views110 pages

Pthreads Mod

Uploaded by

Dr. V. Padmavathi Associate Professor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views110 pages

Pthreads Mod

Uploaded by

Dr. V. Padmavathi Associate Professor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 110

ECE1747 Parallel Programming

Shared Memory Multithreading

Pthreads
Shared Memory
• All threads access the same shared memory
data space.

Shared Memory Address Space

proc1 proc2 proc3 procN

Shared Memory (continued)
• Concretely, it means that a variable x, a
pointer p, or an array a[] refer to the same
object, no matter what processor the
reference originates from.
• We have more or less implicitly assumed
this to be the case in earlier examples.
Shared Memory

proc1 proc2 proc3 procN

Distributed Memory - Message Passing

The alternative model to shared memory.

a a a a
mem1 mem2 mem3 memN

proc1 proc2 proc3 procN

network
Shared Memory vs. Message Passing

• Same terminology is used in distinguishing

hardware.
• For us: distinguish programming models,
not hardware.
Programming vs. Hardware
• One can implement
– a shared memory programming model
– on shared or distributed memory hardware
– (also in software or in hardware)
• One can implement
– a message passing programming model
– on shared or distributed memory hardware
Portability of programming models

shared memory message passing

programming programming

shared memory distr. memory

machine machine
Shared Memory Programming:
Important Point to Remember
• No matter what the implementation, it
conceptually looks like shared memory.
• There may be some (important)
performance differences.
Multithreading
• User has explicit control over thread.
• Good: control can be used to performance
benefit.
• Bad: user has to deal with it.
Pthreads
• POSIX standard shared-memory
multithreading interface.
• Provides primitives for process
management and synchronization.
What does the user have to do?
• Decide how to decompose the computation
into parallel parts.
• Create (and destroy) processes to support
that decomposition.
• Add synchronization to make sure
dependences are covered.
General Thread Structure
• Typically, a thread is a concurrent
execution of a function or a procedure.
• So, your program needs to be restructured
such that parallel parts form separate
procedures or functions.
Example of Thread Creation (contd.)
main()

pthread_ func()
create(func)
Thread Joining Example
void *func(void *) { ….. }
pthread_t id; int X;
pthread_create(&id, NULL, func, &X);
…..
pthread_join(id, NULL);
…..
Example of Thread Creation (contd.)
main()

pthread_
create(func) func()

pthread_
join(id)
pthread_
exit()
Sequential SOR
for some number of timesteps/iterations {
for (i=0; i<n; i++ )
for( j=1, j<n, j++ )
temp[i][j] = 0.25 *
( grid[i-1][j] + grid[i+1]
[j]
grid[i][j-1] + grid[i]
[j+1] );
for( i=0; i<n; i++ )
for( j=1; j<n; j++ )
grid[i][j] = temp[i][j];
}
Parallel SOR
• First (i,j) loop nest can be parallelized.
• Second (i,j) loop nest can be parallelized.
• Must wait to start second loop nest until all
processors have finished first.
• Must wait to start first loop nest of next
iteration until all processors have second
loop nest of previous iteration.
• Give n/p rows to each processor.
Pthreads SOR: Parallel parts (1)
void* sor_1(void *s)
{
int slice = (int) s;
int from = (slice*n)/p;
int to = ((slice+1)*n)/p;
for( i=from; i<to; i++)
for( j=0; j<n; j++ )
temp[i][j] = 0.25*(grid[i-1][j] + grid[i+1]
[j]
+grid[i][j-1] + grid[i][j+1]);
}
Pthreads SOR: Parallel parts (2)
void* sor_2(void *s)
{
int slice = (int) s;
int from = (slice*n)/p;
int to = ((slice+1)*n)/p;

for( i=from; i<to; i++)

for( j=0; j<n; j++ )
grid[i][j] = temp[i][j];
}
Pthreads SOR: main
for some number of timesteps {
for( i=0; i<p; i++ )
pthread_create(&thrd[i], NULL, sor_1, (void *)i);
for( i=0; i<p; i++ )
pthread_join(thrd[i], NULL);
for( i=0; i<p; i++ )
pthread_create(&thrd[i], NULL, sor_2, (void *)i);
for( i=0; i<p; i++ )
pthread_join(thrd[i], NULL);
}
Summary: Thread Management
• pthread_create(): creates a parallel thread
executing a given function (and arguments),
returns thread identifier.
• pthread_exit(): terminates thread.
• pthread_join(): waits for thread with
particular thread identifier to terminate.
Summary: Program Structure
• Encapsulate parallel parts in functions.
• Use function arguments to parameterize
what a particular thread does.
• Call pthread_create() with the function and
arguments, save thread identifier returned.
• Call pthread_join() with that thread
identifier.
Pthreads Synchronization
• Create/exit/join
– provide some form of synchronization,
– at a very coarse level,
– requires thread creation/destruction.
• Need for finer-grain synchronization
– mutex locks,
– condition variables.
Use of Mutex Locks
• To implement critical sections.
• Pthreads provides only exclusive locks.
• Some other systems allow shared-read,
exclusive-write locks.
Condition variables (1 of 5)
pthread_cond_init(
pthread_cond_t *cond,
pthread_cond_attr *attr)
• Creates a new condition variable cond.
• Attribute: ignore for now.
Condition Variables (2 of 5)
pthread_cond_destroy(
pthread_cond_t *cond)
• Destroys the condition variable cond.
Condition Variables (3 of 5)
pthread_cond_wait(
pthread_cond_t *cond,
pthread_mutex_t *mutex)
• Blocks the calling thread, waiting on cond.
• Unlocks the mutex.
Condition Variables (4 of 5)
pthread_cond_signal(
pthread_cond_t *cond)
• Unblocks one thread waiting on cond.
• Which one is determined by scheduler.
• If no thread waiting, then signal is a no-op.
Condition Variables (5 of 5)
pthread_cond_broadcast(
pthread_cond_t *cond)
• Unblocks all threads waiting on cond.
• If no thread waiting, then broadcast is a no-
op.
Use of Condition Variables
• To implement signal-wait synchronization
discussed in earlier examples.
• Important note: a signal is “forgotten” if
there is no corresponding wait that has
already happened.
Barrier Synchronization
• A wait at a barrier causes a thread to wait
until all threads have performed a wait at
the barrier.
• At that point, they all proceed.
Implementing Barriers in Pthreads

• Count the number of arrivals at the barrier.

• Wait if this is not the last arrival.
• Make everyone unblock if this is the last
arrival.
• Since the arrival count is a shared variable,
enclose the whole operation in a mutex
lock-unlock.
Implementing Barriers in Pthreads
void barrier()
{
pthread_mutex_lock(&mutex_arr);
arrived++;
if (arrived<N) {
pthread_cond_wait(&cond, &mutex_arr);
}
else {
pthread_cond_broadcast(&cond);
arrived=0; /* be prepared for next barrier */
}
pthread_mutex_unlock(&mutex_arr);
}
Parallel SOR with Barriers (1 of 2)
void* sor (void* arg)
{
int slice = (int)arg;
int from = (slice * (n-1))/p + 1;
int to = ((slice+1) * (n-1))/p + 1;

for some number of iterations { … }

}
Parallel SOR with Barriers (2 of 2)
for (i=from; i<to; i++)
for (j=1; j<n; j++)
temp[i][j] = 0.25 * (grid[i-1][j] +
grid[i+1][j] + grid[i][j-1] + grid[i]
[j+1]);
barrier();
for (i=from; i<to; i++)
for (j=1; j<n; j++)
grid[i][j]=temp[i][j];
barrier();
Parallel SOR with Barriers: main
int main(int argc, char *argv[])
{
pthread_t *thrd[p];
/* Initialize mutex and condition variables */
for (i=0; i<p; i++)
pthread_create (&thrd[i], &attr, sor,
(void*)i);
for (i=0; i<p; i++)
pthread_join (thrd[i], NULL);
/* Destroy mutex and condition variables */
}
Note again
• Many shared memory programming
systems (other than Pthreads) have barriers
as basic primitive.
• If they do, you should use it, not construct it
yourself.
• Implementation may be more efficient than
what you can do yourself.
Busy Waiting
• Not an explicit part of the API.
• Available in a general shared memory
programming environment.
Busy Waiting
initially: flag = 0;

P1: produce data;

flag = 1;

P2: while( !flag ) ;

consume data;
Use of Busy Waiting
• On the surface, simple and efficient.
• In general, not a recommended practice.
• Often leads to messy and unreadable code
(blurs data/synchronization distinction).
• May be inefficient
Private Data in Pthreads
• To make a variable private in Pthreads, you
need to make an array out of it.
• Index the array by thread identifier, which
you should keep track of .
• Not very elegant or efficient.
Other Primitives in Pthreads
• Set the attributes of a thread.
• Set the attributes of a mutex lock.
• Set scheduling parameters.
ECE 1747 Parallel Programming

Machine-independent
Performance Optimization Techniques
Returning to Sequential vs. Parallel
• Sequential execution time: t seconds.
• Startup overhead of parallel execution: t_st
seconds (depends on architecture)
• (Ideal) parallel execution time: t/p + t_st.
• If t/p + t_st > t, no gain.
General Idea
• Parallelism limited by dependences.
• Restructure code to eliminate or reduce
dependences.
• Sometimes possible by compiler, but good
to know how to do it by hand.
Optimizations: Example 16
for (i = 0; i < 100000; i++)
a[i + 1000] = a[i] + 1;

Cannot be parallelized as is.

May be parallelized by applying certain
code transformations.
Example Transformation
for (i=1; i < 100; i++){
int stride = i* 1000;
for (j = 0; j < 1000; j++)
a[stride+j] = a[j] + i;
}
Code Transformations
• Reorganize code such that
– dependences are removed or reduced
– large pieces of parallel work emerge

• Code can become messy … there is a point

of diminishing returns.
Flavors of Parallelism
• Data parallelism: all processors do the same
thing on different data.
– Regular
– Irregular
• Task parallelism: processors do different
tasks.
– Task queue
– Pipelines
Task Parallelism
• Each process performs a different task.
• Two principal flavors:
– pipelines
– task queues
• Program Examples: PIPE (pipeline), TSP
(task queue).
Pipeline
• Often occurs with image processing
applications, where a number of images
undergoes a sequence of transformations.
• E.g., rendering, clipping, compression, etc.
Sequential Program
for( i=0; i<num_pic, read(in_pic[i]); i++ ) {
int_pic_1[i] = trans1( in_pic[i] );
int_pic_2[i] = trans2( int_pic_1[i]);
int_pic_3[i] = trans3( int_pic_2[i]);
out_pic[i] = trans4( int_pic_3[i]);
}
Parallelizing a Pipeline
• For simplicity, assume we have 4
processors (i.e., equal to the number of
transformations).
• Furthermore, assume we have a very large
number of pictures (>> 4).
Parallelizing a Pipeline (part 1)
Processor 1:

for( i=0; i<num_pics, read(in_pic[i]); i++ ) {

int_pic_1[i] = trans1( in_pic[i] );
signal(event_1_2[i]);
}
Parallelizing a Pipeline (part 2)
Processor 2:

for( i=0; i<num_pics; i++ ) {

wait( event_1_2[i] );
int_pic_2[i] = trans2( int_pic_1[i] );
signal(event_2_3[i] );
}

Same for processor 3

Parallelizing a Pipeline (part 3)
Processor 4:

for( i=0; i<num_pics; i++ ) {

wait( event_3_4[i] );
out_pic[i] = trans4( int_pic_3[i] );
}
Use of Wait/Signal (Pipelining)
• Sequential

• Parallel

(Pattern -- picture; horiz. line -- processor).

PIPE
P1:for( i=0; i<num_pics, read(in_pic); i++ ) {
int_pic_1[i] = trans1( in_pic );
signal( event_1_2[i] );
}
P2: for( i=0; i<num_pics; i++ ) {
wait( event_1_2[i] );
int_pic_2[i] = trans2( int_pic_1[i] );
signal( event_2_3[i] );
}
PIPE Using Pthreads
• Replacing the original wait/signal by a
Pthreads condition variable wait/signal will
not work.
– signals before a wait are forgotten.
– we need to remember a signal.
How to remember a signal (1 of 2)
semaphore_signal(i) {
pthread_mutex_lock(&mutex_rem[i]);
arrived [i]= 1;
pthread_cond_signal(&cond[i]);
pthread_mutex_unlock(&mutex_rem[i]);
}
How to Remember a Signal (2 of 2)
semaphore_wait(i) {
pthreads_mutex_lock(&mutex_rem[i]);
if( arrived[i] = 0 ) {
pthreads_cond_wait(&cond[i],
mutex_rem[i]);
}
arrived[i] = 0;
pthreads_mutex_unlock(&mutex_rem[i]);
}
PIPE with Pthreads
P1:for( i=0; i<num_pics, read(in_pic); i++ ) {
int_pic_1[i] = trans1( in_pic );
semaphore_signal( event_1_2[i] );
}
P2: for( i=0; i<num_pics; i++ ) {
semaphore_wait( event_1_2[i] );
int_pic_2[i] = trans2( int_pic_1[i] );
semaphore_signal( event_2_3[i] );
}
Another Sequential Program
for( i=0; i<num_pic, read(in_pic); i++ ) {
int_pic_1 = trans1( in_pic );
int_pic_2 = trans2( int_pic_1);
int_pic_3 = trans3( int_pic_2);
out_pic = trans4( int_pic_3);
}
Can we use same parallelization?
Processor 2:

for( i=0; i<num_pics; i++ ) {

wait( event_1_2[i] );
int_pic_2 = trans1( int_pic_1 );
signal(event_2_3[i] );
}

Same for processor 3

Can we use same parallelization?
• No, because of anti-dependence between
stages, there is no parallelism.
• We used privatization to enable pipeline
parallelism.
• Used often to avoid dependences (not only
with pipelines).
• Costly in terms of memory.
In-between Solution
• Use n>1 buffers between stages.
• Block when buffers are full or empty.

P1 P2 P3 P4
Perfect Pipeline ?

(Pattern -- picture; horiz. line -- processor).

Things are often not that perfect
• One stage takes more time than others.
• Stages take a variable amount of time.
• Extra buffers provide some cushion against
variability.
Task Parallelism
• Each process performs a different task.
• Two principal flavors:
– pipelines
– task queues
• Program Examples: PIPE (pipeline), TSP
(task queue).
TSP (Traveling Salesman)
• Goal:
– given a list of cities, a matrix of distances
between them, and a starting city,
– find the shortest tour in which all cities are
visited exactly once.
• Example of an NP-hard search problem.
• Algorithm: branch-and-bound.
Branching
Initialization:
 go from starting city to each possible city
 put resulting partial path into priority queue,
ordered by its current length.
Further (repeatedly):
 take head element out of priority queue,
 expand by each one of remaining cities,
 put resulting partial path into priority queue.
Finding the Solution
• Eventually, a complete path will be found.
• Remember its length as the current shortest
path.
• Every time a complete path is found, check
if we need to update current best path.
• When priority queue becomes empty, best
path is found.
Using a Simple Bound
• Once a complete path is found, we have a
bound on the length of shortest path.
• No use in exploring partial path that is
already longer than the current lower
bound.
Sequential TSP: Data Structures
• Priority queue of partial paths.
• Current best solution and its length.
• For simplicity, we will ignore bounding.
Sequential TSP: Code Outline
init_q(); init_best();
while( (p=de_queue()) != NULL ) {
for each expansion by one city {
q = add_city(p);
if( complete(q) ) { update_best(q) };
else { en_queue(q) };
}
}
Parallel TSP: Possibilities
• Have each process do one expansion.
• Have each process do expansion of one
partial path.
• Have each process do expansion of multiple
partial paths.
• Issue of granularity/performance, not an
issue of correctness.
• Assume: process expands one partial path.
Parallel TSP: Synchronization
• True dependence between process that puts
partial path in queue and the one that takes
it out.
• Dependences arise dynamically.
• Required synchronization: need to make
process wait if q is empty.
Parallel TSP: First cut (part 1)
process i:
while( (p=de_queue()) != NULL ) {
for each expansion by one city {
q = add_city(p);
if complete(q) { update_best(q) };
else en_queue(q);
}
}
Parallel TSP: First cut (part 2)
• In de_queue: wait if q is empty
• In en_queue: signal that q is no longer
empty
Parallel TSP: More synchronization

• All processes operate, potentially at the

same time, on q and best.
• This race must not be allowed to happen.
• Critical section: only one process can
execute in critical section at once.
Parallel TSP: Critical Sections
• All shared data must be protected by critical
section.
• Update_best must be protected by a critical
section.
• En_queue and de_queue must be protected
by the same critical section.
Termination condition
• How do we know when we are done?
• All processes are waiting inside de_queue.
• Count the number of waiting processes
before waiting.
• If equal to total number of processes, we are
done.
Parallel TSP
process i:
while( (p=de_queue()) != NULL ) {
for each expansion by one city {
q = add_city(p);
if complete(q) { update_best(q) };
else en_queue(q);
}
}
Parallel TSP
• Need critical section
– in update_best,
– in en_queue/de_queue.
• In de_queue
– wait if q is empty,
– terminate if all processes are waiting.
• In en_queue:
– signal q is no longer empty.
Parallel TSP: Mutual Exclusion
en_queue() / de_queue() {
pthreads_mutex_lock(&queue);
…;
pthreads_mutex_unlock(&queue);
}
update_best() {
pthreads_mutex_lock(&best);
…;
pthreads_mutex_unlock(&best);
}
Parallel TSP: Condition Synchronization
de_queue() {
while( (q is empty) and (not done) ) {
waiting++;
if( waiting == p ) {
done = true;
pthreads_cond_broadcast(&empty, &queue);
}
else {
pthreads_cond_wait(&empty, &queue);
waiting--;
}
}
if( done )
return null;
else
remove and return head of the queue;
}
Parallel TSP
• Complete parallel program will be provided
on the Web.
• Includes wait/signal on empty q.
• Includes critical sections.
• Includes termination condition.
Factors that Determine Speedup
• Characteristics of parallel code
– granularity
– load balance
– locality
– communication and synchronization
Granularity
• Granularity = size of the program unit that
is executed by a single processor.
• May be a single loop iteration, a set of loop
iterations, etc.
• Fine granularity leads to:
– (positive) ability to use lots of processors
– (positive) finer-grain load balancing
– (negative) increased overhead
Granularity and Critical Sections
• Small granularity => more processors =>
more critical section accesses => more
contention.
Issues in Performance of Parallel Parts

• Granularity.
• Load balance.
• Locality.
• Synchronization and communication.
Load Balance
• Load imbalance = difference in execution
time between processors between barriers.
• Execution time may not be predictable.
– Regular data parallel: yes.
– Irregular data parallel or pipeline: perhaps.
– Task queue: no.
Static vs. Dynamic
• Static: done once, by the programmer
– block, cyclic, etc.
– fine for regular data parallel
• Dynamic: done at runtime
– task queue
– fine for unpredictable execution times
– usually high overhead
• Semi-static: done once, at run-time
Choice is not inherent
• MM or SOR could be done using task
queues: put all iterations in a queue.
– In heterogeneous environment.
– In multitasked environment.
Static Load Balancing
• Block
– best locality
– possibly poor load balance
• Cyclic
– better load balance
– worse locality
• Block-cyclic
– load balancing advantages of cyclic (mostly)
– better locality
Dynamic Load Balancing (1 of 2)
• Centralized: single task queue.
– Easy to program
– Excellent load balance
• Distributed: task queue per processor.
– Less contention during synchronization
Dynamic Load Balancing (2 of 2)
• Task stealing with distributed queues:
– Processes normally remove and insert tasks
from their own queue.
– When queue is empty, remove task(s) from
other queues.
• Extra overhead and programming difficulty.
• Better load balancing.
Semi-static Load Balancing
• Measure the cost of program parts.
• Use measurement to partition computation.
• Done once, done every iteration, done every
n iterations.
Molecular Dynamics (MD)
• Simulation of a set of bodies under the
influence of physical laws.
• Atoms, molecules, celestial bodies, ...
• Have same basic structure.

F
F

F
Molecular Dynamics (Skeleton)
for some number of timesteps {
for all molecules i
for all other molecules j
force[i] += f( loc[i], loc[j] );
for all molecules i
loc[i] = g( loc[i], force[i] );
}
Molecular Dynamics
• To reduce amount of computation, account
for interaction only with nearby molecules.
Molecular Dynamics (continued)
for some number of timesteps {
for all molecules i
for all nearby molecules j
force[i] += f( loc[i], loc[j] );
for all molecules i
loc[i] = g( loc[i], force[i] );
}
Molecular Dynamics (continued)
for each molecule i
number of nearby molecules count[i]
array of indices of nearby molecules index[j]
( 0 <= j < count[i])
Molecular Dynamics (continued)
for some number of timesteps {
for( i=0; i<num_mol; i++ )
for( j=0; j<count[i]; j++ )
force[i] +=
f(loc[i],loc[index[j]]);
for( i=0; i<num_mol; i++ )
loc[i] = g( loc[i], force[i] );
}
Molecular Dynamics (simple)
for some number of timesteps {
parallel for
for( i=0; i<num_mol; i++ )
for( j=0; j<count[i]; j++ )
force[i] += f(loc[i],loc[index[j]]);
parallel for
for( i=0; i<num_mol; i++ )
loc[i] = g( loc[i], force[i] );
}
Molecular Dynamics (simple)
• Simple to program.
• Possibly poor load balance
– block distribution of i iterations (molecules)
– could lead to uneven neighbor distribution
– cyclic does not help
Better Load Balance
• Assign iterations such that each processor
has ~ the same number of neighbors.
• Array of “assign records”
– size: number of processors
– two elements:
• beginning i value (molecule)
• ending i value (molecule)
• Recompute partition periodically
Frequency of Balancing
• Every time neighbor list is recomputed.
– once during initialization.
– every iteration.
– every n iterations.
• Extra overhead vs. better approximation
and better load balance.
Summary
• Parallel code optimization
– Granularity
– Load balance
– Locality
– Synchronization

Cisco CCNA Exam Prep Guide
No ratings yet
Cisco CCNA Exam Prep Guide
14 pages
Shared-Memory Programming Guide
No ratings yet
Shared-Memory Programming Guide
33 pages
Programming Shared Address Space Platforms: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Programming Shared Address Space Platforms: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
Programming Shared Address Space Platforms
No ratings yet
Programming Shared Address Space Platforms
44 pages
Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
No ratings yet
Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
73 pages
P Threads Intro
No ratings yet
P Threads Intro
19 pages
Unix Threads
No ratings yet
Unix Threads
36 pages
Huawei.H12 811 ENU.v2021 07 18.q158
No ratings yet
Huawei.H12 811 ENU.v2021 07 18.q158
42 pages
C++ Pthreads Lab Guide
No ratings yet
C++ Pthreads Lab Guide
8 pages
POSIX Threads & Mutex Guide
No ratings yet
POSIX Threads & Mutex Guide
20 pages
Pthreads Programming Lecture Notes
No ratings yet
Pthreads Programming Lecture Notes
34 pages
Introduction To Pthreads
No ratings yet
Introduction To Pthreads
19 pages
Threads: Tevfik Koşar
100% (1)
Threads: Tevfik Koşar
40 pages
OpenMP Parallel Programming Guide
No ratings yet
OpenMP Parallel Programming Guide
74 pages
7.4.3.4 Packet Tracer - Configuring Basic EIGRP With IPv6 Routing Instructions IG PDF
100% (1)
7.4.3.4 Packet Tracer - Configuring Basic EIGRP With IPv6 Routing Instructions IG PDF
3 pages
Lect9 Pthread
No ratings yet
Lect9 Pthread
24 pages
Threads and Multithreading
No ratings yet
Threads and Multithreading
36 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
Pthread PDF
No ratings yet
Pthread PDF
33 pages
Threads: Multicore Programming Multithreading Models Thread Libraries Threading Issues Operating System Examples
No ratings yet
Threads: Multicore Programming Multithreading Models Thread Libraries Threading Issues Operating System Examples
22 pages
POSIX Threads
No ratings yet
POSIX Threads
20 pages
CS241 System Programming: Discussion Section 4 Feb 13 - Feb 16
No ratings yet
CS241 System Programming: Discussion Section 4 Feb 13 - Feb 16
31 pages
Lecture09 ConcurrentProgramming 02 Synchronization
No ratings yet
Lecture09 ConcurrentProgramming 02 Synchronization
30 pages
Parallel Progamming With Pthreads
No ratings yet
Parallel Progamming With Pthreads
79 pages
Course: Parallel Processing Lab #2 - Multithreads and Openmp
No ratings yet
Course: Parallel Processing Lab #2 - Multithreads and Openmp
14 pages
TCP IP Protocol Suite Chap-02 OSI Model
No ratings yet
TCP IP Protocol Suite Chap-02 OSI Model
39 pages
High Performance Computing
No ratings yet
High Performance Computing
67 pages
ch4并发编程
No ratings yet
ch4并发编程
45 pages
ERP Demand Jun'21
0% (1)
ERP Demand Jun'21
176 pages
Pthreads
No ratings yet
Pthreads
70 pages
08 Systems Programming-Concurrent Programming
No ratings yet
08 Systems Programming-Concurrent Programming
61 pages
Raisecom ISCOM Series Switch Configurati
No ratings yet
Raisecom ISCOM Series Switch Configurati
163 pages
Contrail Analytics Troubleshooting Guide
No ratings yet
Contrail Analytics Troubleshooting Guide
548 pages
Threads
No ratings yet
Threads
32 pages
Multi-Threaded Programming With POSIX Threads - Linux Systems Programming
No ratings yet
Multi-Threaded Programming With POSIX Threads - Linux Systems Programming
2,608 pages
Pthread and Semaphore Programming Guide
No ratings yet
Pthread and Semaphore Programming Guide
24 pages
MikroTik Firewall Script Guide
No ratings yet
MikroTik Firewall Script Guide
2 pages
EE426 OS Lab7 Spring2023
No ratings yet
EE426 OS Lab7 Spring2023
7 pages
Lec7 - TLP Shared Memory and OpenMP
No ratings yet
Lec7 - TLP Shared Memory and OpenMP
45 pages
Zeb OSARSInstall
No ratings yet
Zeb OSARSInstall
128 pages
HCIP-Datacom-Core Technology V1.0 Lab Guide
No ratings yet
HCIP-Datacom-Core Technology V1.0 Lab Guide
282 pages
Introduction To Cisco SD-WAN (Viptela) - IP With Ease
No ratings yet
Introduction To Cisco SD-WAN (Viptela) - IP With Ease
3 pages
Nothing
No ratings yet
Nothing
12 pages
Shared Memory Programming Guide
No ratings yet
Shared Memory Programming Guide
54 pages
Ex 5
No ratings yet
Ex 5
8 pages
P Threads
No ratings yet
P Threads
72 pages
MAP - Unit2
No ratings yet
MAP - Unit2
134 pages
Interworking SS7 With IP and H.323
No ratings yet
Interworking SS7 With IP and H.323
11 pages
MPI Pacheco Ch3
No ratings yet
MPI Pacheco Ch3
124 pages
Federated Learning For Healthcare Informatics
100% (1)
Federated Learning For Healthcare Informatics
19 pages
Process-to-Process Delivery: SCTP
No ratings yet
Process-to-Process Delivery: SCTP
44 pages
Lecture 5
No ratings yet
Lecture 5
51 pages
Cloud Architecture Essentials
No ratings yet
Cloud Architecture Essentials
61 pages
Networking Essentials For Cybersecurity
No ratings yet
Networking Essentials For Cybersecurity
10 pages
Unit 4
No ratings yet
Unit 4
42 pages
Week 9
No ratings yet
Week 9
20 pages
OPEN NETWORKING Network Disaggregation RFI
No ratings yet
OPEN NETWORKING Network Disaggregation RFI
9 pages
Lecture 4
No ratings yet
Lecture 4
41 pages
Configure SIP TLS Between CUCM CUBE CUBE
No ratings yet
Configure SIP TLS Between CUCM CUBE CUBE
8 pages
05-Semaphores Monitors Barriers-S20
No ratings yet
05-Semaphores Monitors Barriers-S20
60 pages
Lecture 9 Programming Shared Address Space Platforms Using POSIX Thread API
No ratings yet
Lecture 9 Programming Shared Address Space Platforms Using POSIX Thread API
35 pages
5 Threads
No ratings yet
5 Threads
33 pages
Lecture Open MP
No ratings yet
Lecture Open MP
25 pages
Getting Started - Adapter
No ratings yet
Getting Started - Adapter
6 pages
Secure Federated Learning Review
No ratings yet
Secure Federated Learning Review
18 pages
Parallel Programming Unit 2
No ratings yet
Parallel Programming Unit 2
71 pages
Entropy 23 00460 v2
No ratings yet
Entropy 23 00460 v2
14 pages
Dopamine: Differentially Private Federated Learning On Medical Data
No ratings yet
Dopamine: Differentially Private Federated Learning On Medical Data
9 pages
Operating System - Lab 4
No ratings yet
Operating System - Lab 4
7 pages
Efficient Homomorphic Encryption in Federated Learning
No ratings yet
Efficient Homomorphic Encryption in Federated Learning
20 pages
Sec22 Stevens
No ratings yet
Sec22 Stevens
18 pages
Lecture 16
No ratings yet
Lecture 16
30 pages
NSUK LMS PPT CMP 122-2 - The Client - Server Model
No ratings yet
NSUK LMS PPT CMP 122-2 - The Client - Server Model
10 pages
Securing Federated Learning With Blockchain: A Systematic Literature Review
No ratings yet
Securing Federated Learning With Blockchain: A Systematic Literature Review
35 pages
Lec 4
No ratings yet
Lec 4
48 pages
12 MPIProgramPerformance
No ratings yet
12 MPIProgramPerformance
33 pages
1 - Network Design and Presentation-1
No ratings yet
1 - Network Design and Presentation-1
7 pages
Ccna 1 Final Exam
No ratings yet
Ccna 1 Final Exam
77 pages
Q-circuit LATEX Tutorial Guide
No ratings yet
Q-circuit LATEX Tutorial Guide
7 pages
1 s2.0 S016740482300007X Main
No ratings yet
1 s2.0 S016740482300007X Main
18 pages
6-Posix Threads
No ratings yet
6-Posix Threads
32 pages
v1 Covered
No ratings yet
v1 Covered
16 pages
Cybersecurity
No ratings yet
Cybersecurity
11 pages
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
No ratings yet
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
50 pages
1 s2.0 S0950705121000381 Main
No ratings yet
1 s2.0 S0950705121000381 Main
11 pages
Front Threads
No ratings yet
Front Threads
18 pages
Li Model-Contrastive Federated Learning CVPR 2021 Paper
No ratings yet
Li Model-Contrastive Federated Learning CVPR 2021 Paper
10 pages
Lab 2 Threads
No ratings yet
Lab 2 Threads
6 pages
An Approachto Work With Quantum Data in
No ratings yet
An Approachto Work With Quantum Data in
6 pages
PDC Lecture 05
No ratings yet
PDC Lecture 05
48 pages
Draytek v2962 Datasheet
No ratings yet
Draytek v2962 Datasheet
3 pages
Quantum Federated Learning Remarks and Challenges
No ratings yet
Quantum Federated Learning Remarks and Challenges
5 pages
5-Case Study-19-09-2024
No ratings yet
5-Case Study-19-09-2024
75 pages
Thread Programming Ch6
No ratings yet
Thread Programming Ch6
4 pages
POSIX Concurrency in C - Complete Guide2
No ratings yet
POSIX Concurrency in C - Complete Guide2
27 pages
What Are The SOCKS 5 Proxy Server Addresses - IPVanish
0% (1)
What Are The SOCKS 5 Proxy Server Addresses - IPVanish
3 pages
Cos 7
No ratings yet
Cos 7
63 pages
Chapter 2 Basic Connectivity
No ratings yet
Chapter 2 Basic Connectivity
15 pages
MIT6 087IAP10 Lec13
No ratings yet
MIT6 087IAP10 Lec13
38 pages
PThread and Semaphore Examples 3
No ratings yet
PThread and Semaphore Examples 3
6 pages
CN Financial Project
No ratings yet
CN Financial Project
16 pages
Cloud Computing Infrastructure
No ratings yet
Cloud Computing Infrastructure
3 pages
Version
No ratings yet
Version
23 pages
Os Unit - 3
No ratings yet
Os Unit - 3
13 pages
P Threads
No ratings yet
P Threads
33 pages
ch06 Ms RC4 PRNG
No ratings yet
ch06 Ms RC4 PRNG
30 pages
RC5-stream Cipher Algorithm
No ratings yet
RC5-stream Cipher Algorithm
30 pages
Blow Fish
No ratings yet
Blow Fish
11 pages
BDP 2023 05
No ratings yet
BDP 2023 05
26 pages
Apt05 2024S2
No ratings yet
Apt05 2024S2
23 pages

Pthreads Mod

Uploaded by

Pthreads Mod

Uploaded by

ECE1747 Parallel Programming

Shared Memory Multithreading

Shared Memory Address Space

proc1 proc2 proc3 procN

proc1 proc2 proc3 procN

The alternative model to shared memory.

proc1 proc2 proc3 procN

• Same terminology is used in distinguishing

shared memory message passing

shared memory distr. memory

for( i=from; i<to; i++)

• Count the number of arrivals at the barrier.

for some number of iterations { … }

P1: produce data;

P2: while( !flag ) ;

Cannot be parallelized as is.

• Code can become messy … there is a point

for( i=0; i<num_pics, read(in_pic[i]); i++ ) {

for( i=0; i<num_pics; i++ ) {

Same for processor 3

for( i=0; i<num_pics; i++ ) {

(Pattern -- picture; horiz. line -- processor).

for( i=0; i<num_pics; i++ ) {

Same for processor 3

(Pattern -- picture; horiz. line -- processor).

• All processes operate, potentially at the

You might also like