BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
GHOUSIA COLLEGE OF
ENGINEERING
RAMANAGARAR-562159
Department of Computer Science & Engineering
Subject Code: BCS702
Semester-VI
Subject: PARALLEL COMPUTING
Staff incharge:
Ms.Saniya mehdi
Ms.saniya Mehdi Page 1
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
Experiment-01
01.Write an OpenMP program to sort an array of n elements using both sequential and
parallel merge sort (using section). Record and compare the execution time of both
methods.
What is OpenMp?
OpenMP is a set of compiler directives as well as an API for programs written in C, C++, or
FORTRAN that provides support for parallel programming in sharedmemory environments.
OpenMP identifies parallel regions as blocks of code that may run in parallel. Application
developers insert compiler directives into their code at parallel regions, and these directives
instruct the OpenMP run-time library to execute the region in parallel.
What is merge sort?
Merge sort is a divide and conquer algorithm that sorts the input array by breaking it into
subarrays until they have 1 element, and then merging the results back into a sorted array. The
time and space complexities are O(N log N) and O(n) respectively.
Sequential Merge Sort
Sequential merge sort in C is a sorting algorithm that follows the divide-and-conquer paradigm.
It recursively divides an array into two halves until the sub-arrays contain only one element
(which is considered sorted). Then, it merges these sub-arrays in a sorted manner to produce
new sorted sub-arrays, repeating this process until the entire array is sorted.
Parallel Merge Sort
Parallel merge sort is an algorithm that sorts a list by recursively dividing it into sublists,
sorting them independently using multiple threads or processes, and then merging the sorted
sublists. This approach leverages parallel processing to improve sorting speed, especially for
large datasets.
CODE:
// experiment1.c
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
// Merge function
void merge(int arr[], int left, int mid, int right) {
int n1 = mid - left + 1;
int n2 = right - mid;
int L[n1], R[n2];
for (int i = 0; i < n1; i++)
L[i] = arr[left + i];
for (int j = 0; j < n2; j++)
R[j] = arr[mid + 1 + j];
int i = 0, j = 0, k = left;
while (i < n1 && j < n2) {
Ms.saniya Mehdi Page 2
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
if (L[i] <= R[j])
arr[k++] = L[i++];
else
arr[k++] = R[j++];
}
while (i < n1) arr[k++] = L[i++];
while (j < n2) arr[k++] = R[j++];
}
// Sequential Merge Sort
void mergeSortSequential(int arr[], int left, int right) {
if (left < right) {
int mid = left + (right - left) / 2;
mergeSortSequential(arr, left, mid);
mergeSortSequential(arr, mid + 1, right);
merge(arr, left, mid, right);
}
}
// Parallel Merge Sort using OpenMP Sections
void mergeSortParallel(int arr[], int left, int right) {
if (left < right) {
int mid = left + (right - left) / 2;
#pragma omp parallel sections
{
#pragma omp section
mergeSortParallel(arr, left, mid);
#pragma omp section
mergeSortParallel(arr, mid + 1, right);
}
merge(arr, left, mid, right);
}
}
int main() {
int n;
printf("Enter number of elements: ");
scanf("%d", &n);
int *arr1 = (int *)malloc(n * sizeof(int));
int *arr2 = (int *)malloc(n * sizeof(int));
printf("Enter %d elements: ", n);
for (int i = 0; i < n; i++) {
scanf("%d", &arr1[i]);
arr2[i] = arr1[i]; // Copy array for parallel sorting
}
double start, end;
Ms.saniya Mehdi Page 3
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
// Sequential Sort Timing
start = omp_get_wtime();
mergeSortSequential(arr1, 0, n - 1);
end = omp_get_wtime();
printf("Sequential Merge Sort Time: %f seconds\n", end - start);
// Parallel Sort Timing
start = omp_get_wtime();
mergeSortParallel(arr2, 0, n - 1);
end = omp_get_wtime();
printf("Parallel Merge Sort Time: %f seconds\n", end - start);
printf("Sorted array: ");
for (int i = 0; i < n; i++)
printf("%d ", arr1[i]);
printf("\n");
free(arr1);
free(arr2);
return 0;
}
OUTPUT
Ms.saniya Mehdi Page 4
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
Experiment-02
02.Write an OpenMP program that divides the Iterations into chunks containing 2
iterations, respectively (OMP_SCHEDULE=static,2). Its input should be the number of
iterations, and its output should be which iterations of a parallelized for loop are executed
by which thread. For example, if there are two threads and four iterations, the output
might be the following:
Thread 0: Iterations 0 – 1 b. Thread 1: Iterations 2 – 3
What is Scheduling in OpenMP?
Scheduling is a method in OpenMP to distribute iterations to different threads in for loop.
Describes how iterations of the loop are divided among the threads in the team. The default
schedule is implementation dependent.
Static:
Loop iterations are divided into pieces of size chunk and then statically assigned to threads. If
chunk is not specified, the iterations are evenly (if possible) divided contiguously among the
threads. If you do not specify chunk-size variable, OpenMP will divides iterations into chunks
that are approximately equal in size and it distributes chunks to threads in order.
CODE :
// iterations.c
#include <stdio.h>
#include <omp.h>
int main() {
int num_iterations;
printf("Enter the number of iterations: ");
scanf("%d", &num_iterations);
#pragma omp parallel
{
#pragma omp for schedule(static, 2)
for (int i = 0; i < num_iterations; i++) {
printf("Thread %d: Iteration %d\n", omp_get_thread_num(), i);
}
}
return 0;
}
Ms.saniya Mehdi Page 5
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
OUTPUT
Ms.saniya Mehdi Page 6
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
Experiment-03
03.Write a OpenMP program to calculate n Fibonacci numbers using tasks.
Tasks: An OpenMP task is a single line of code or a structured block which is immediately
“written down” in a list of tasks. The new task can be executed immediately, or it can be
deferred. If the if clause is used and the argument evaluates to 0, then the task is executed
immediately, superseding whatever else that thread is doing. There has to be an existing
parallel thread team for this to work. Otherwise one thread ends up doing all tasks and you
don’t get any contribution to parallelism.
Fibonacci sequence: The Fibonacci sequence is a sequence where the next term is the sum of
the previous two terms. The first two terms of the Fibonacci sequence are 0 followed by 1. Ex:
Fibonacci Sequence is : 0,1,1,2,3,5,8,13……….
CODE:
#include <stdio.h>
#include <omp.h>
int fib(int n)
{
int i, j;
if (n < 2)
return n;
else {
#pragma omp task shared(i) firstprivate(n)
i = fib(n - 1);
#pragma omp task shared(j) firstprivate(n)
j = fib(n - 2);
#pragma omp taskwait
return i + j;
}
}
int main()
{
int n = 10;
omp_set_dynamic(0);
omp_set_num_threads(4);
#pragma omp parallel shared(n)
{
#pragma omp single
printf("fib(%d) = %d\n", n, fib(n));
}
return 0;
Ms.saniya Mehdi Page 7
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
OUTPUT
Ms.saniya Mehdi Page 8
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
Experiment-04
04. Write a OpenMP program to find the prime numbers from 1 to n employing parallel
for directive. Record both serial and parallel execution times.
Parallel Directive:
The omp parallel directive explicitly instructs the compiler to parallelize the chosen block of
code.
For Directive:
Causes the work done in a for loop inside a parallel region to be divided among threads.
Prime Number:
A prime number is a positive integer that is divisible only by 1 and itself. For example: 2, 3, 5,
7, 11, 13, 17…..
CODE:
#include <stdio.h>
#include <omp.h>
int main() {
int prime[1000], i, j, n;
double start, end;
printf("\nEnter the value of n to find prime numbers from 1 to n: ");
scanf("%d", &n);
// Initialize all as prime
for (i = 1; i <= n; i++) {
prime[i] = 1;
}
prime[1] = 0; // 1 is not prime
// Sequential Execution
start = omp_get_wtime();
for (i = 2; i * i <= n; i++) {
for (j = i * i; j <= n; j = j + i) {
prime[j] = 0;
}
}
end = omp_get_wtime();
printf("\nSequential Execution Time: %f seconds\n", end - start);
// Re-initialize all as prime again
for (i = 1; i <= n; i++) {
prime[i] = 1;
}
prime[1] = 0;
Ms.saniya Mehdi Page 9
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
// Parallel Execution
start = omp_get_wtime();
for (i = 2; i * i <= n; i++) {
#pragma omp parallel for
for (j = i * i; j <= n; j = j + i) {
prime[j] = 0;
}
}
end = omp_get_wtime();
printf("Parallel Execution Time: %f seconds\n", end - start);
printf("\nPrime numbers from 1 to %d are:\n", n);
for (i = 2; i <= n; i++) {
if (prime[i] == 1) {
printf("%d ", i);
}
}
printf("\n");
return 0;
}
OUTPUT
Ms.saniya Mehdi Page 10
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
Experiment-05
5. Write a MPI Program to demonstration of MPI_Send and MPI_Recv.
Message-Passing Interface (MPI)
Message-Passing is a communication model used on distributed-memory architecture. MPI is a
standard that specifies the message-passing libraries supporting parallel programming in C/C++ or
Fortran. MPI_Send and MPI_Recv are the basic building blocks for essentially all of the more
specialized MPI commands in MPI. They are also the basic communication tools in MPI
application. Since MPI_Send and MPI_Recv involve two ranks, they are called “point-to-point”
communication. The process of communicating data follows a standard pattern. Rank A decides to
send data to rank B. It first packs the data into a buffer. This avoids sending multiple messages,
which would take more time. Rank A then calls MPI_Send to create a message for rank B. The
communication device is then given the responsibility of routing the message to the correct
destination. Rank B must know that it is about to receive a message and acknowledge this by
calling MPI_Recv. MPI_RANK: "Rank" refers to a unique identifier assigned to each process
within a communication group. It's a logical way of numbering processes to facilitate
communication between them. MPI automatically assigns ranks in the MPI_COMM_WORLD
group when the MPI environment is initialized (MPI_Init ).
CODE:
#include <stdio.h>
#include "mpi.h"
int main(int argc, char** argv)
{
int my_rank, numbertoreceive[10], numbertosend[3] = {73, 2, -16};
int recv_count, i;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
if (my_rank == 0) {
MPI_Recv(numbertoreceive, 3, MPI_INT, MPI_ANY_SOURCE,
MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("status.MPI_SOURCE = %d\n", status.MPI_SOURCE);
printf("status.MPI_TAG = %d\n", status.MPI_TAG);
printf("status.MPI_ERROR = %d\n", status.MPI_ERROR);
MPI_Get_count(&status, MPI_INT, &recv_count);
printf("Received %d data values:\n", recv_count);
for (i = 0; i < recv_count; i++)
printf("recv[%d] = %d\n", i, numbertoreceive[i]);
} else {
Ms.saniya Mehdi Page 11
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
MPI_Send(numbertosend, 3, MPI_INT, 0, 10, MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}
OUTPUT
Ms.saniya Mehdi Page 12
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
Experiment-06
06.Write a MPI program to demonstration of deadlock using point to point
communication and avoidance of deadlock by altering the call sequence.
Deadlock:
Deadlock is an often-encountered situation in parallel processing. It results when two or more
processes are in contention for the same set of resources. In communications, a typical scenario
involves two processes wishing to exchange messages: each is trying to give a message to the
other, but neither of them is ready to accept a message.
What is point-to-point communication in MPI?
MPI processes communicate by explicitly sending and receiving messages. In point to-point,
messages are sent between two processes. Since MPI processes are independent, in order to
coordinate work, they need to communicate by explicitly sending and receiving messages.
CODE 1:
#include <stdio.h>
#include "mpi.h"
int main(int argc, char** argv)
{
int my_rank, numbertoreceive[10], numbertosend[3] = {73, 2, -16};
int recv_count, i;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
if (my_rank == 0) {
MPI_Recv(numbertoreceive, 3, MPI_INT, MPI_ANY_SOURCE,
MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("status.MPI_SOURCE = %d\n", status.MPI_SOURCE);
printf("status.MPI_TAG = %d\n", status.MPI_TAG);
printf("status.MPI_ERROR = %d\n", status.MPI_ERROR);
MPI_Get_count(&status, MPI_INT, &recv_count);
printf("Received %d data values:\n", recv_count);
for (i = 0; i < recv_count; i++)
printf("recv[%d] = %d\n", i, numbertoreceive[i]);
} else {
MPI_Send(numbertosend, 3, MPI_INT, 0, 10, MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}
Ms.saniya Mehdi Page 13
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
CODE 2:
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
int rank, size, data = 0;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (size < 2) {
printf("This program requires at least 2 processes.\n");
MPI_Abort(MPI_COMM_WORLD, 1);
}
if (rank == 0) {
MPI_Send(&data, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
printf("Process 0 sent data to Process 1\n");
MPI_Recv(&data, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 0 received data from Process 1\n");
} else if (rank == 1) {
MPI_Recv(&data, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 1 received data from Process 0\n");
MPI_Send(&data, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
printf("Process 1 sent data to Process 0\n");
}
MPI_Finalize();
return 0;
}
Ms.saniya Mehdi Page 14
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
OUTPUT
Ms.saniya Mehdi Page 15
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
Experiment-07
07. Write a MPI Program to demonstration of Broadcast operation.
Broadcasting with MPI_Bcast:
A broadcast is one of the standard collective communication techniques. During a broadcast, one
process sends the same data to all processes in a communicator. One of the main uses of
broadcasting is to send out user input to a parallel program, or send out configuration parameters to
all processes. Basically process zero is the root process, and it has the initial copy of data. All of the
other processes receive the copy of data.
CODE 1:
// File: broadcast.c
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
int rank, size;
int data; // Variable to broadcast
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank == 0) {
data = 42; // Root initializes the data
printf("Process %d is broadcasting data = %d\n", rank, data);
}
// Broadcast from process 0 to all
MPI_Bcast(&data, 1, MPI_INT, 0, MPI_COMM_WORLD);
// Every process prints received data
printf("Process %d received data = %d\n", rank, data);
MPI_Finalize();
return 0;
}
Ms.saniya Mehdi Page 16
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
CODE 2:
// File: broadcast_verbose.c
#include <mpi.h>
#include <stdio.h>
#define CONDUCTOR 0
int main(int argc, char** argv) {
int answer = 0;
int numProcs = 0, myRank = 0;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
if (myRank == CONDUCTOR) {
answer = 42;
}
printf("BEFORE broadcast, process %d's answer = %d\n", myRank, answer);
MPI_Bcast(&answer, 1, MPI_INT, CONDUCTOR, MPI_COMM_WORLD);
printf("AFTER broadcast, process %d's answer = %d\n", myRank, answer);
MPI_Finalize();
return 0;
}
Ms.saniya Mehdi Page 17
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
OUTPUT
Ms.saniya Mehdi Page 18
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
Experiment-08
08.Write a MPI Program demonstration of MPI_Scatter and MPI_Gather.
MPI_Scatter:
MPI_Scatter is a collective routine that is very similar to MPI_Bcast. MPI_Scatter involves a
designated root process sending data to all processes in a communicator. The primary
difference between MPI_Bcast and MPI_Scatter is small but important. MPI_Bcast sends the
same piece of data to all processes while MPI_Scatter sends chunks of an array to different
processes.
MPI_Gather:
MPI_Gather is the inverse of MPI_Scatter. Instead of spreading elements from one process to
many processes, MPI_Gather takes elements from many processes and gathers them to one
single process. This routine is highly useful to many parallel algorithms, such as parallel
sorting and searching.
CODE:
#include <mpi.h>
#include <stdio.h>
int main(int argc, char **argv) {
int size, rank;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int globaldata[100]; // Large enough array
int localdata;
// Step 1: Initialize data in root process (rank 0)
if (rank == 0) {
for (int i = 0; i < size; i++) {
globaldata[i] = i;
}
printf("1. Processor %d has data: ", rank);
for (int i = 0; i < size; i++) {
printf("%d ", globaldata[i]);
}
printf("\n");
}
// Step 2: Scatter the global data to all processes
MPI_Scatter(globaldata, 1, MPI_INT, &localdata, 1, MPI_INT, 0,
MPI_COMM_WORLD);
printf("2. Processor %d received data: %d\n", rank, localdata);
Ms.saniya Mehdi Page 19
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
// Step 3: Modify local data
localdata = 5;
printf("3. Processor %d updated data to: %d\n", rank, localdata);
// Step 4: Gather all modified data back to root process
MPI_Gather(&localdata, 1, MPI_INT, globaldata, 1, MPI_INT, 0,
MPI_COMM_WORLD);
if (rank == 0) {
printf("4. Processor %d gathered data: ", rank);
for (int i = 0; i < size; i++) {
printf("%d ", globaldata[i]);
}
printf("\n");
}
MPI_Finalize();
return 0;
}
OUTPUT
Ms.saniya Mehdi Page 20
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
Experiment-09
09.Write a MPI Program to demonstration of MPI_Reduce and MPI_Allreduce
(MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD)
MPI_Reduce:
Reduce is a classic concept from functional programming. Data reduction involves reducing a
set of numbers into a smaller set of numbers via a function. For example, let’s say we have a
list of numbers [1, 2, 3, 4, 5]. Reducing this list of numbers with the sum function would
produce sum([1, 2, 3, 4, 5]) = 15. Similarly, the multiplication reduction would yield
multiply([1, 2, 3, 4, 5]) = 120. MPI_Reduce takes an array of input elements on each process
and returns an array of output elements to the root process. The output elements contain the
reduced result.
MPI_AllReduce:
Many parallel applications will require accessing the reduced results across all processes
rather than the root process. In a similar complementary style of MPI_Allgather to
MPI_Gather, MPI_Allreduce will reduce the values and distribute the results to all processes.
CODE:
#include <mpi.h>
#include <stdio.h>
int main(int argc, char* argv[]) {
int rank, size;
int value, result;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
value = rank + 1; // Each process holds its rank + 1
// Reduce using MPI_SUM
MPI_Reduce(&value, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (rank == 0) {
printf("MPI_Reduce (SUM): Total = %d\n", result);
}
// Allreduce using MPI_MAX
MPI_Allreduce(&value, &result, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD);
printf("MPI_Allreduce (MAX) at process %d: %d\n", rank, result);
// Allreduce using MPI_MIN
MPI_Allreduce(&value, &result, 1, MPI_INT, MPI_MIN, MPI_COMM_WORLD);
printf("MPI_Allreduce (MIN) at process %d: %d\n", rank, result);
Ms.saniya Mehdi Page 21
BCS702 | PARALLEL COMPUTING Integrated Lab |
saSaniya Mehdi
// Allreduce using MPI_PROD
MPI_Allreduce(&value, &result, 1, MPI_INT, MPI_PROD, MPI_COMM_WORLD);
printf("MPI_Allreduce (PROD) at process %d: %d\n", rank, result);
MPI_Finalize();
return 0;
}
OUTPUT
Ms.saniya Mehdi Page 22