Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
14 views5 pages

OpenMP Basics for Programmers

Uploaded by

cvidal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views5 pages

OpenMP Basics for Programmers

Uploaded by

cvidal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1 OpenMP Multithreaded Programming 2

Parallel Programming using OpenMP


• OpenMP stands for “Open Multi-Processing”
• OpenMP is a multi-vendor (see next page) standard to perform shared-memory
multithreading
• OpenMP is both compiler-directive- and library-based
• OpenMP threads share a single executable, a single global memory, and a single
heap (malloc, new)
• Each OpenMP thread has its own stack (function arguments, function return
address, local variables)
Mike Bailey
• Using OpenMP usually requires no dramatic code changes
[email protected]
• OpenMP probably gives you the biggest multithread benefit per amount of work you
have to put in to using it

This work is licensed under a Creative Commons


Attribution-NonCommercial-NoDerivatives 4.0 Much of your use of OpenMP will be accomplished by issuing
International License C/C++ “pragmas” to tell the compiler how to build the threads
into your executable, like this:

#pragma omp directive [clause]


Computer Graphics Computer Graphics
openmp.pptx mjb – April 9, 2024 mjb – April 9, 2024

1 2

Who is in the OpenMP Consortium? 3 What OpenMP Isn’t: 4

• OpenMP doesn’t check for data dependencies, data conflicts, deadlocks, or race
conditions. You are responsible for avoiding those yourself

• OpenMP doesn’t check for non-conforming code sequences (we'll talk about what this
means)

• OpenMP doesn’t guarantee identical behavior across vendors or hardware, or even


between multiple runs on the same vendor’s hardware

• OpenMP doesn’t guarantee the order in which threads execute, just that they do execute

• OpenMP is not overhead-free

• OpenMP does not prevent you from writing code that triggers cache performance
problems (such as in false-sharing), in fact, it makes it really easy

We will get to “false sharing” in the cache notes

Computer Graphics Computer Graphics


mjb – April 9, 2024 mjb – April 9, 2024

3 4

Memory Allocation in a Multithreaded Program 5 6


Using OpenMP on Linux
One-thread Multiple-threads
Stack g++ -o proj proj.cpp -lm -fopenmp

Stack Stack
Using OpenMP in Microsoft Visual Studio
Don’t take this completely
1. Go to the Project menu → Project Properties
literally. The exact Common
arrangement depends on the
Program operating system and the Program
Executable compiler. For example, Executable
sometimes the stack and heap 2. Change the setting Configuration Properties → C/C++ → Language →
are arranged so that they grow OpenMP Support to "Yes (/openmp)"
towards each other.
Globals Common
Globals If you are using Visual Studio and get a compile message that looks like this:
1>c1xx: error C2338: two-phase name lookup is not supported for C++/CLI, C++/CX, or OpenMP; use /Zc:twoPhase-
then do this:

Heap Common 1. Go to "Project Properties“→ "C/C++" → "Command Line“


Heap 2. Add /Zc:twoPhase- in "Additional Options" in the bottom section
3. Press OK

Computer Graphics Computer Graphics


mjb – April 9, 2024 mjb – April 9, 2024

5 6

1
7 Numbers of OpenMP threads 8
Seeing if OpenMP is Supported on Your System
#ifdef _OPENMP
fprintf( stderr, "OpenMP release %d is supported here\n", _OPENMP ); How to specify how many OpenMP threads you want to have available:
#else omp_set_num_threads( num );
fprintf( stderr, "OpenMP is not supported here – sorry!\n" );
return 1;
Asking how many cores this program has access to:
#endif Actually returns the number of hyperthreads,
num = omp_get_num_procs( );
not the number of physical cores
Printing _OPENMP gives you a year and month of the OpenMP release that you are using
Setting the number of available threads to the exact number of cores available:
To get the OpenMP version number from the year and month, check here:
OpenMP 5.0 – November 2018 omp_set_num_threads( omp_get_num_procs( ) );
OpenMP 4.5 – November 2015
Asking how many OpenMP threads this program is using right now:
OpenMP 4.0 – July 2013
num = omp_get_num_threads( );
OpenMP 3.1 – July 2011
OpenMP 2.0 – March 2002 Asking which thread number this one is:
OpenMP 1.0 – October 1998 me = omp_get_thread_num( );

• By default, flip uses g++ 11.4, which uses OpenMP version 4.5
• Visual Studio 2022 uses OpenMP 2.0
Computer Graphics Computer Graphics
mjb – April 9, 2024 mjb – April 9, 2024

7 8

Creating an OpenMP Team of Threads 9 The OpenMP Thread Team Prints a Friendly Message 10

#pragma omp parallel default(none) This creates a


team of threads #include <stdio.h>
{
#include <omp.h>
Each thread then executes all
... int
lines of code in this block.
main( )
}
{
omp_set_num_threads( 8 );
#pragma omp parallel default(none)
{
printf( “Hello, World, from thread #%d ! \n” , omp_get_thread_num( ) );
Think of it this way: }
return 0;
}
#pragma omp parallel default(none)

Hint: run it several times in a row. What do you see? Why?

Computer Graphics Computer Graphics


mjb – April 9, 2024 mjb – April 9, 2024

9 10

Uh-oh… 11 Creating OpenMP Threads to Process Loop Passes 12

First Run Second Run


#include <omp.h> The code starts out executing
Hello, World, from thread #6 ! Hello, World, from thread #0 ! in a single thread
Hello, World, from thread #1 ! Hello, World, from thread #7 !
Hello, World, from thread #7 ! Hello, World, from thread #4 ! ... This sets how many threads will be created. It
Hello, World, from thread #5 ! Hello, World, from thread #6 ! doesn’t create them yet; it just says how many
omp_set_num_threads( NUMT ); will be used the next time you ask for them.
Hello, World, from thread #4 ! Hello, World, from thread #1 !
Hello, World, from thread #3 ! Hello, World, from thread #3 ! ...
Hello, World, from thread #2 ! Hello, World, from thread #5 ! Here we ask for them. This creates
Hello, World, from thread #0 ! Hello, World, from thread #2 ! #pragma omp parallel for default(none)
a team of threads and divides the
for( int i = 0; i < arraySize; i++ ) for-loop passes up among those
{ threads
Third Run Fourth Run ... There is an “implied barrier” at the end where
Hello, World, from thread #2 ! Hello, World, from thread #1 ! each thread waits until all threads are done, then
} the code continues in a single thread
Hello, World, from thread #5 ! Hello, World, from thread #3 !
Hello, World, from thread #0 ! Hello, World, from thread #5 ! This tells the compiler to parallelize the for-loop into multiple threads. Each thread
Hello, World, from thread #7 ! Hello, World, from thread #2 ! automatically gets its own personal copy of the variable i because it is defined within the
Hello, World, from thread #1 ! Hello, World, from thread #4 ! for-loop body.
Hello, World, from thread #3 ! Hello, World, from thread #7 !
Hello, World, from thread #4 ! Hello, World, from thread #6 !
Hello, World, from thread #6 ! Hello, World, from thread #0 !
The default(none) directive forces you to explicitly declare all variables declared outside the
parallel region to be either private or shared while they are in the parallel region. Variables
There is no guarantee of thread execution order! declared within the for-loop are automatically private.
Computer Graphics Computer Graphics
mjb – April 9, 2024 mjb – April 9, 2024

11 12

2
OpenMP for-Loop Rules 13 OpenMP For-Loop Rules 14

#pragma omp parallel for default(none), shared(…), private(…)

for( int index = start ; index terminate condition; index changed )

index++
++index
• The index must be an int or a pointer index < end index--
• The start and terminate conditions must have compatible types index <= end ; --index
for( index = start ; )
index > end index += incr
• Neither the start nor the terminate conditions can be changed during the index = index + incr
execution of the loop
index >= end
index = incr + index
• The index can only be modified by the changed expression (i.e., not index -= decr
modified inside the loop itself) index = index - decr
• You cannot use a break or a goto to get out of the loop
• There can be no inter-loop data dependencies such as:
a[ i ] = a[ i-1 ] + 1.;
a[101] = a[100] + 1.; // what if this is the last line of thread #0’s work?

a[102] = a[101] + 1.; // what if this is the first line of thread #1’s work?

Computer Graphics Computer Graphics


mjb – April 9, 2024 mjb – April 9, 2024

13 14

What to do about Variables Declared Before the for-loop Starts? 15 For-loop “Fission” 16

float x = 0.; Because of the loop dependency, this whole thing is not parallelizable:
#pragma omp parallel for …
x[ 0 ] = 0.;
for( int i = 0; i < N; i++ )
{
y[ 0 ] *= 2.;
i and y are automatically private because they are for( int i = 1; i < N; i++ )
x = (float) i;
defined within the loop. {
float y = x*x;
<< more code… > x[ i ] = x[ i-1 ] + 1.;
} Good practice demands that x be explicitly y[ i ] *= 2.;
declared to be shared or private!
}
private(x)
Means that each thread will get its own version of the variable
But it can be broken into one loop that is not parallelizable, plus one that is:
shared(x) x[ 0 ] = 0.;
Means that all threads will share a common version of the variable for( int i = 1; i < N; i++ )
{
default(none) x[ i ] = x[ i-1 ] + 1.;
I recommend that you include this in your OpenMP for-loop directive. This will }
force you to explicitly flag all of your externally-declared variables as shared or
private. Don’t make a mistake by leaving it up to the default! #pragma omp parallel for shared(y)
for( int i = 0; i < N; i++ )
{
Example:
y[ i ] *= 2.;
#pragma omp parallel for default(none), private(x)
}
Computer Graphics Computer Graphics
mjb – April 9, 2024 mjb – April 9, 2024

15 16

For-loop “Collapsing” 17 Single Program Multiple Data (SPMD) in OpenMP 18

Uh-oh, which for-loop do you put the #pragma on?


for( int i = 1; i < N; i++ ) #define NUM 1000000
{ float A[NUM], B[NUM], C[NUM];
for( int j = 0; j < M; j++ )
...
{
... int total = omp_get_num_threads( );
} #pragma omp parallel default(none),shared(total)
} {
How many for-loops to
int me = omp_get_thread_num( );
collapse into one loop
Ah-ha – trick question. You put it on both! DoWork( me, total );
}
#pragma omp parallel for collapse(2)
for( int i = 1; i < N; i++ )
{ void DoWork( int m, int t )
{
for( int j = 0; j < M; j++ ) int first = NUM * m / t;
{ int last = NUM * (m+1)/t - 1;
... for( int i = first; i <= last; i++ )
} {
} C[ i ] = A[ i ] * B[ i ];
}
}
Computer Graphics Computer Graphics
mjb – April 9, 2024 mjb – April 9, 2024

17 18

3
OpenMP Allocation of Work to Threads 19 OpenMP Allocation of Work to Threads 20

#pragma omp parallel for default(none),schedule(static,chunksize)


for( int index = 0 ; index < 12 ; index++ )
Static Threads
• All work is allocated and assigned at runtime Static,1
0 0,3,6,9 chunksize = 1
Dynamic Threads 1 1,4,7,10 Each thread is assigned one iteration, then
• The pool is statically assigned some of the work at runtime, but not all of it 2 2,5,8,11 the assignments start over
• When a thread from the pool becomes idle, it gets a new assignment
• “Round-robin assignments” Static,2
0 0,1,6,7 chunksize = 2
1 2,3,8,9 Each thread is assigned two iterations, then
2 4,5,10,11 the assignments start over
OpenMP Scheduling
schedule(static [,chunksize]) Static,4
0 0,1,2,3 chunksize = 4
schedule(dynamic [,chunksize])
1 4,5,6,7 Each thread is assigned four iterations, then
Defaults to static
2 8,9,10,11 the assignments start over
chunksize defaults to 1

Think of dealing for-loop passes to


threads the same way as dogs deal cards

Computer Graphics Computer Graphics
mjb – April 9, 2024 mjb – April 9, 2024

19 20

Arithmetic Operations Among Threads – A Problem 21 Here’s a trapezoid integration example. 22


The partial sums are added up, as shown on the previous page.
float sum = 0.; The integration was done 30 times.
#pragma omp parallel for default(none), shared(sum) The answer is supposed to be exactly 2.
for( int i = 0; i < N; i++ )
{
None of the 30 answers is even close.
float myPartialSum = … And not only are the answers bad, but they are not even consistently bad!

sum = sum + myPartialSum; 0.469635 0.398893


} 0.517984 0.446419
0.438868 0.431204
0.437553 0.501783
• There is no guarantee when each thread will execute this line 0.398761 0.334996
0.506564 0.484124
• There is not even a guarantee that each thread will finish this line before some 0.489211 0.506362
0.584810 0.448226
other thread interrupts it. (Remember that each line of code usually generates 0.476670 0.434737
multiple lines of assembly.) 0.530668 0.444919
0.500062 0.442432
0.672593 0.548837
• This is non-deterministic !
0.411158 0.363092
Assembly code: 0.408718 0.544778
0.523448 0.356299
Load sum What if the scheduler
Add myPartialSum decides to switch
Store sum threads right here?

Computer Graphics Conclusion: Don’t do it this way! Don’t do it this way! We’ll talk about how to do it correctly in the Trapezoid Integration noteset.
Computer Graphics
mjb – April 9, 2024 mjb – April 9, 2024

21 22

Synchronization 23 Synchronization Example 24

Mutual Exclusion Locks (Mutexes)


Blocks if the lock is not available
omp_init_lock( omp_lock_t * ); omp_lock_t Sync;
Then sets it and returns when it is available
omp_set_lock( omp_lock_t * ); ...
omp_unset_lock( omp_lock_t * ); omp_init_lock( &Sync );
If the lock is not available, returns 0
omp_test_lock( omp_lock_t * );
If the lock is available, sets it and returns !0

( omp_lock_t is really an array of 4 unsigned chars ) ...


Thread #0: Thread #1:
Critical sections
omp_set_lock( &Sync ); omp_set_lock( &Sync );
#pragma omp critical
<< code that needs the mutual exclusion >> << code that needs the mutual exclusion >>
Restricts execution to one thread at a time
omp_unset_lock( &Sync ); omp_unset_lock( &Sync );
#pragma omp single
Restricts execution to a single thread ever

Barriers
#pragma omp barrier
Forces each thread to wait here until all threads arrive

(Note: there is an implied barrier after parallel for loops and OpenMP sections,
unless the nowait clause is used)

Computer Graphics Computer Graphics


mjb – April 9, 2024 mjb – April 9, 2024

23 24

4
Synchronization Example 25 Single-thread-execution Synchronization 26

omp_lock_t Sync;
...
omp_init_lock( &Sync ); #pragma omp single

Restricts execution to a single thread ever. This is used when an operation only
... makes sense for one thread to do. Reading data from a file is a good example.

Thread #0: Thread #1:


while( omp_test_lock( &Sync ) == 0 ) while( omp_test_lock( &Sync ) == 0 )
{ {
DoSomeUsefulWork_0( ); DoSomeUsefulWork_1( );
} }

Computer Graphics Computer Graphics


mjb – April 9, 2024 mjb – April 9, 2024

25 26

Creating Sections of OpenMP Code 27 A Functional Decomposition Sections Example 28

Sections are independent blocks of code, able to be


assigned to separate threads if they are available. omp_set_num_threads( 3 );

#pragma omp parallel sections #pragma omp parallel sections


{ {
#pragma omp section
#pragma omp section {
{ Watcher( );
Task 1 }
}
#pragma omp section
#pragma omp section {
{ Animals( );
Task 2 }
}
} #pragma omp section
{
Plants( );
}
(Note: there is an implied barrier after parallel for loops and OpenMP
sections, unless the nowait clause is used) } // implied barrier -- all functions must return to get past here

Computer Graphics Computer Graphics


mjb – April 9, 2024 mjb – April 9, 2024

27 28

29
A Potential OpenMP/Visual Studio Compiler Problem

If you print to standard error (stderr) from inside a for-loop, like I do, then
you think that you need to include stderr in the shared list because, well,
the loops share it:

#pragma omp parallel for default(none) shared(a,b,stderr)

This turns out to be true for g++/gcc only.

If you are using Visual Studio, then do not include stderr in the list.
If you do, you will get this error:

1>Y:\CS575\SQ22\robertw5-01\Project1\Project1.cpp(113,98): error C2059: syntax error: '(‘

This is because:
• In g++/gcc, stderr is a variable
• In Visual Studio, stderr is a defined macro

Computer Graphics
mjb – April 9, 2024

29

You might also like