0% found this document useful (0 votes)

27 views47 pages

LECTURE 4 - Parallel Computing Design (PART 1)

Lecture 4 of CSC580 covers the fundamentals of parallel computing design, focusing on parallel algorithms, decomposition techniques, and task interaction. It discusses how to decompose problems into tasks for concurrent execution, the importance of task dependency and interaction graphs, and various decomposition strategies such as recursive, data, exploratory, and speculative decomposition. The lecture emphasizes the trade-offs between granularity, concurrency, and communication overhead in parallel processing.

Uploaded by

2024793147

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views47 pages

LECTURE 4 - Parallel Computing Design (PART 1)

Uploaded by

2024793147

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

CSC580

Parallel Processing
LECTURE 4:
Parallel Computing Design (PART 1)

PREPARED BY: SALIZA RAMLY

Topic Overview
This topic introduces the students:
Algorithms and Concurrency
o Introduction to Parallel Algorithms
o Tasks and Decomposition
o Processes and Mapping
o Processes Versus Processors
o Decomposition Techniques
o Recursive Decomposition
o Data Decomposition
o Exploratory Decomposition
o Speculative Decomposition
SALIZA RAMLY - CSC580
Introduction to
Parallel Algorithms

SALIZA RAMLY - CSC580

Introduction
Parallel algorithm
o It tells us how to solve a given problem using multiple processors.
o It involves more than just specifying the steps.
o It has the added dimension of concurrency
o The algorithm designer must specify sets of steps that can be executed simultaneously.

o This chapter methodically discuss the process of designing and implementing

parallel algorithm.

SALIZA RAMLY - CSC580

Preliminaries

SALIZA RAMLY - CSC580

Preliminaries: Decomposition, Tasks, and
Dependency Graphs
o The first step in developing a parallel algorithm is to decompose the problem
into tasks that can be executed concurrently
o A given problem may be docomposed into tasks in many different ways.
o Tasks may be of same, different, or even in various sizes.
o A decomposition can be illustrated in the form of a directed graph with nodes
corresponding to tasks and edges indicating that the result of one task is
required for processing the next. Such a graph is called a task dependency
graph.

SALIZA RAMLY - CSC580

Example: Multiplying a Dense Matrix
with a Vector
Observations: While tasks
share data (namely, the
vector b), they do not have
any control dependencies -
i.e., no task needs to wait
for the (partial) completion
of any other. All tasks are
of the same size in terms of
number of operations.
Is this the maximum
Computation of each element of output vector y is independent of other number of tasks we could
elements. Based on this, a dense matrix-vector product can be decompose this problem
decomposed into n tasks. The figure highlights the portion of the matrix into?
and vector accessed by Task 1.
SALIZA RAMLY - CSC580
Example: Database Query Processing
Consider the execution of the query:
MODEL = ``CIVIC'' AND YEAR = 2001 AND (COLOR = ``GREEN'' OR COLOR = ``WHITE)
on the following database: ID# Model Year Color Dealer Price
4523 Civic 2002 Blue MN $18,000
3476 Corolla 1999 White IL $15,000
7623 Camry 2001 Green NY $21,000
9834 Prius 2001 Green CA $18,000
6734 Civic 2001 White OR $17,000
5342 Altima 2001 Green FL $19,000
3845 Maxima 2001 Blue NY $22,000
8354 Accord 2000 Green VT $18,000
4395 Civic 2001 Red CA $17,000
7352 Civic 2002 Red WA $18,000
A database storing information about used vehicles

SALIZA RAMLY - CSC580

Example: Database Query Processing
The execution of the query can be
divided into subtasks in various
ways. Each task can be thought of
as generating an intermediate
table of entries that satisfy a
particular clause.

Task: create sets of elements that satisfy a (or several) criteria.

Edge: output of one task serves as input to the next
SALIZA RAMLY - CSC580
Example: Database Query Processing
Note that the same problem
can be decomposed into
subtasks in other ways as well.
Different task decomposition
leads to different parallelism

An alternate decomposition of the given problem into

subtasks, along with their data dependencies.
SALIZA RAMLY - CSC580
Preliminaries: Granularity, Concurrency,
and Task-Interaction
The number of tasks into which a problem is decomposed determines its
granularity.

Fine-grained • Decomposition into a large

decomposition number of tasks

Coarse grained • Decomposition into a small Each task in this example

corresponds to the
decomposition number of tasks
computation of three
elements of the result
vector.

SALIZA RAMLY - CSC580

Degree of Concurrency
Degree of Concurrency: the number of tasks that can execute in parallel

maximum degree of concurrency average degree of concurrency

average number of tasks that

largest number of concurrent
can be executed concurrently
tasks at any point of the
over the execution of the
execution.
program.

Degree of Concurrency vs. Task Granularity : Inverse relation

The degree of concurrency increases as the decomposition becomes finer in granularity and vice versa.

SALIZA RAMLY - CSC580

Critical Path of Task Graph
Directed path a sequence of tasks that must be processed one after the other

The longest directed path between any pair of start node (node with
Critical path
no incoming edge) and finish node (node with on outgoing edges).

Critical path
The sum of weights of nodes along critical path.
length
Average
degree of total amount of work / critical path length
concurrency

SALIZA RAMLY - CSC580

Q: What the average degree of
concurrency in each decomposition?
Consider the task dependency graphs of the two databases query decompositions:

Task Dependency graphs

Graph (a): Graph (b):

Max no. of concurrency = ? Max no. of concurrency = ?
Critical path length = ? Critical path length = ?
Average degree of concurrency = ? Average degree of concurrency = ?
SALIZA RAMLY - CSC580
Q: What is the upper bound of the
number of concurrent tasks?

Multiplying a dense matrix with a vector: there can be no more than (n2) concurrent tasks.

SALIZA RAMLY - CSC580

Limits on Parallelization
o It would appear that the parallel time can be made
Q: The larger the number of arbitrarily small by making the decomposition finer
concurrent tasks, the better? in granularity.
o There is an inherent bound on how fine the
granularity of a computation can be. For example,
in the case of multiplying a dense matrix with a
vector, there can be no more than (n2) concurrent
tasks.
o Concurrent tasks may also have to exchange data
with other tasks. This results in communication
overhead. The tradeoff between the granularity of
a decomposition and associated overheads often
determines performance bounds.

SALIZA RAMLY - CSC580

Task Interaction Graphs
Subtasks exchange data with others in a decomposition.

For example, even in the trivial

decomposition of the dense matrix-
vector product, if the vector is not
replicated across all tasks, they will have
to communicate elements of the vector.

The graph of tasks(nodes) and their interactions/data exchange (edges) is referred to as a

task interaction graph.
Note: -> task interaction graphs represent data dependencies.
-> task dependency graphs represent control dependencies.
SALIZA RAMLY - CSC580
Q: Can you explain this task interaction
graph?
Multiplying a sparse matrix A with a vector b.

• The computation of each element of the result vector is a task.

Task Interaction graph
• Only non-zero elements of matrix A participate in the computation.
• We partition b across tasks, then the task interaction graph of the computation is identical to the graph of the
matrix A
SALIZA RAMLY - CSC580
Task Interaction Graphs, Granularity, and
Communication

If the granularity of a decomposition is finer, the associated

overhead (as a ratio of useful work associated with a task)
increases.

SALIZA RAMLY - CSC580

Task Interaction Graphs, Granularity, and
Communication
Example: Each node takes unit time to process and each interaction (edge) causes
an overhead of a unit time.

Viewing node 0 as an independent task involves a useful computation of one time

unit and overhead (communication) of three time units. How to solve this problem?

SALIZA RAMLY - CSC580

Processes and Mapping
o In general, the number of tasks in a decomposition exceeds the number of
processing elements available.

o For this reason, a parallel algorithm must also provide a mapping of tasks to
processes.

SALIZA RAMLY - CSC580

Processes and Mapping
Note:
o We refer to the mapping as being from tasks to processes, as opposed to
processors.
o This is because typical programming APIs, as we shall see, do not allow easy
binding of tasks to physical processors.
o Rather, we aggregate tasks into processes and rely on the system to map these
processes to physical processors.
o We use processes (not in the UNIX sense of a process) simply as a collection of
tasks and associated data.

SALIZA RAMLY - CSC580

Processes and Mapping
Appropriate mapping of tasks to processes is critical to the parallel performance of
an algorithm.
Mappings are determined by both the task dependency and task interaction graphs.

Task dependency graphs Task interaction graphs

can be used to ensure that work is equally can be used to make sure that processes
spread across all processes at any point need minimum interaction with other
(minimum idling and optimal load balance). processes (minimum communication).

SALIZA RAMLY - CSC580

Processes and Mapping
An appropriate mapping must minimize parallel execution time by:
o Mapping independent tasks to different processes.
o Assigning tasks on critical path to processes as soon as they become available.
o Minimizing interaction between processes by mapping tasks with dense interactions
to the same process.

Note: These criteria often conflict with each other. For example, a decomposition
into one task (or no decomposition at all) minimizes interaction but does not
result in a speedup at all! Can you think of other such conflicting cases?

SALIZA RAMLY - CSC580

Processes and Mapping: Example
Mapping tasks in the database query decomposition to processes. These
mappings were arrived at by viewing the dependency graph in terms of levels
(no two nodes in a level have dependencies). Tasks within a single level are then
assigned to different processes.

SALIZA RAMLY - CSC580

Decomposition
Techniques

SALIZA RAMLY - CSC580

Decomposition Techniques
So how does one decompose a task into various subtasks?
While there is no single recipe that works for all problems, we present a set of commonly
used techniques that apply to broad classes of problems.

Decomposition Techniques

recursive data exploratory speculative

decomposition decomposition decomposition decomposition

SALIZA RAMLY - CSC580

Recursive Decomposition
o Generally suited to problems that are solved using the divide-and-conquer
strategy.
o A given problem is first decomposed into a set of sub-problems.
o These sub-problems are recursively decomposed further until a desired
granularity is reached.

SALIZA RAMLY - CSC580

Recursive Decomposition Example
A classic example of a divide-and-
conquer algorithm on which we can
apply recursive decomposition is
Quicksort.
In this example, once the list has
been partitioned around the pivot,
each sublist can be processed
concurrently (i.e., each sublist
represents an independent subtask).
This can be repeated recursively.

SALIZA RAMLY - CSC580

Recursive Decomposition Example
The problem of finding the minimum number in a given list (or indeed any other
associative operation such as sum, AND, etc.) can be fashioned as a divide-and-
conquer algorithm. The following algorithm illustrates this.
We first start with a simple serial loop for computing the minimum entry in a
given list:
1. procedure SERIAL_MIN (A, n)
2. begin
3. min = A[0];
4. for i := 1 to n − 1 do
5. if (A[i] < min) min := A[i];
6. endfor;
7. return min;
8. end SERIAL_MIN

SALIZA RAMLY - CSC580

Recursive Decomposition Example
We can rewrite the loop as follows: 1. procedure RECURSIVE_MIN (A, n)
2. begin
3. if ( n = 1 ) then
4. min := A [0] ;
5. else
6. lmin := RECURSIVE_MIN ( A, n/2 );
7. rmin := RECURSIVE_MIN ( &(A[n/2]), n - n/2 );
8. if (lmin < rmin) then
9. min := lmin;
10. else
11. min := rmin;
12. endelse;
13. endelse;
14. return min;
15. end RECURSIVE_MIN

SALIZA RAMLY - CSC580

Recursive Decomposition Example
The code in the previous foil can be decomposed naturally using a recursive
decomposition strategy. We illustrate this with the following example of finding
the minimum number in the set {4, 9, 1, 7, 8, 11, 2, 12}. The task dependency
graph associated with this computation is as follows:

SALIZA RAMLY - CSC580

Data Decomposition
o Identify the data on which computations are performed.
o Partition this data across various tasks.
o This partitioning induces a decomposition of the problem.
o Data can be partitioned in various ways - this critically impacts performance of
a parallel algorithm.

SALIZA RAMLY - CSC580

Data Decomposition: Output Data
Decomposition
o Often, each element of the output can be computed independently of others
(but simply as a function of the input).
o A partition of the output across tasks decomposes the problem naturally.

SALIZA RAMLY - CSC580

Output Data Decomposition: Example
Consider the problem of multiplying two n x n matrices A and B to yield matrix C.
The output matrix C can be partitioned into four tasks as follows:

SALIZA RAMLY - CSC580

Output Data Decomposition: Example
A partitioning of output Decomposition I Decomposition II
data does not result in a
Task 1: C1,1 = A1,1 B1,1 Task 1: C1,1 = A1,1 B1,1
unique decomposition into
tasks. For example, for the Task 2: C1,1 = C1,1 + A1,2 B2,1 Task 2: C1,1 = C1,1 + A1,2 B2,1
same problem as in Task 3: C1,2 = A1,1 B1,2 Task 3: C1,2 = A1,2 B2,2
previous foil, with identical
output data distribution, Task 4: C1,2 = C1,2 + A1,2 B2,2 Task 4: C1,2 = C1,2 + A1,1 B1,2
we can derive the following Task 5: C2,1 = A2,1 B1,1 Task 5: C2,1 = A2,2 B2,1
two (other)
Task 6: C2,1 = C2,1 + A2,2 B2,1 Task 6: C2,1 = C2,1 + A2,1 B1,1
decompositions:
Task 7: C2,2 = A2,1 B1,2 Task 7: C2,2 = A2,1 B1,2

Task 8: C2,2 = C2,2 + A2,2 B2,2 Task 8: C2,2 = C2,2 + A2,2 B2,2

SALIZA RAMLY - CSC580

Output Data Decomposition: Example
Consider the problem of counting the instances of given itemsets in a database
of transactions. In this case, the output (itemset frequencies) can be partitioned
across tasks.

SALIZA RAMLY - CSC580

Output Data Decomposition: Example
From the previous example, the following observations can be made:

If the database of transactions is If the database is partitioned across

replicated across the processes processes as well (for reasons of memory
utilization)

each task first computes partial

each task can be independently
counts. These counts are then
accomplished with no
aggregated at the appropriate
communication.
task.

SALIZA RAMLY - CSC580

Input Data Partitioning
o Generally applicable if each output can be naturally computed as a function of
the input.
o In many cases, this is the only natural decomposition because the output is not
clearly known a-priori (e.g., the problem of finding the minimum in a list,
sorting a given list, etc.).
o A task is associated with each input data partition. The task performs as much
of the computation with its part of the data. Subsequent processing combines
these partial results.

SALIZA RAMLY - CSC580

Input Data Partitioning: Example
In the database counting example, the input (i.e., the transaction set) can be
partitioned. This induces a task decomposition in which each task generates
partial counts for all itemsets. These are combined subsequently for aggregate
counts.

SALIZA RAMLY - CSC580

Partitioning Input and Output Data
Often input and output
data decomposition
can be combined for a
higher degree of
concurrency. For the
itemset counting
example, the
transaction set (input)
and itemset counts
(output) can both be
decomposed as
follows:

SALIZA RAMLY - CSC580

Intermediate Data Partitioning
o Computation can often be viewed as a sequence of transformation from the
input to the output data.
o In these cases, it is often beneficial to use one of the intermediate stages as a
basis for decomposition.

SALIZA RAMLY - CSC580

Intermediate Data Partitioning: Example
Let us revisit the
example of dense
matrix multiplication.
We first show how we
can visualize this
computation in terms
of intermediate
matrices D.

SALIZA RAMLY - CSC580

Intermediate Data Partitioning: Example
A decomposition of
intermediate data structure
leads to the following
decomposition into 8 + 4
tasks:

SALIZA RAMLY - CSC580

Intermediate Data Partitioning: Example
The task dependency graph for the decomposition (shown in previous foil) into
12 tasks is as follows:

SALIZA RAMLY - CSC580

The Owner Computes Rule
o The Owner Computes Rule generally states that the process assigned a
particular data item is responsible for all computation associated with it.

In the case of INPUT data In the case of OUTPUT data

decomposition decomposition

the owner computes rule implies the owner computes rule implies
that all computations that use that the output is computed by
the input data are performed by the process to which the output
the process. data is assigned.

SALIZA RAMLY - CSC580

LECTURE5:

NEXT! PARALLEL
ALGORITHM DESIGN
(PART 2)

SALIZA RAMLY - CSC580

Unit 2 HPC
No ratings yet
Unit 2 HPC
92 pages
LECTURE 5 - Parallel Computing Design (PART 2)
No ratings yet
LECTURE 5 - Parallel Computing Design (PART 2)
54 pages
WINSEM2022-23 CSE4001 ETH VL2022230503176 Reference Material I 02-02-2023 Module3-ParallelDecomposition
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503176 Reference Material I 02-02-2023 Module3-ParallelDecomposition
89 pages
Parallel Algorithms & Concurrency
No ratings yet
Parallel Algorithms & Concurrency
84 pages
Unit 2 HPC - Nap
No ratings yet
Unit 2 HPC - Nap
72 pages
Parallel Algorithm Design Basics
No ratings yet
Parallel Algorithm Design Basics
78 pages
Unit 2 - Part - 1
No ratings yet
Unit 2 - Part - 1
32 pages
Parallel Algorithm Design Guide
No ratings yet
Parallel Algorithm Design Guide
35 pages
Unit 2
No ratings yet
Unit 2
151 pages
Parallel Programming: Lecture #9
No ratings yet
Parallel Programming: Lecture #9
24 pages
Unit 2
No ratings yet
Unit 2
81 pages
Unit 2
No ratings yet
Unit 2
64 pages
Lecture 5 Principles of Parallel Algorithm Design
No ratings yet
Lecture 5 Principles of Parallel Algorithm Design
30 pages
HPC - Unit-2 Insem Notes
No ratings yet
HPC - Unit-2 Insem Notes
99 pages
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-19 Reference-Material-I
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-19 Reference-Material-I
72 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-12 Reference-Material-I
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-12 Reference-Material-I
28 pages
ConcurrencyDecomposition Parallel Algorithm
No ratings yet
ConcurrencyDecomposition Parallel Algorithm
40 pages
Parallel Algorithm Design Basics
No ratings yet
Parallel Algorithm Design Basics
63 pages
Parallel Algorithm Design Guide
No ratings yet
Parallel Algorithm Design Guide
107 pages
Lecture 6 Principles of Parallel Algorithm Design
No ratings yet
Lecture 6 Principles of Parallel Algorithm Design
35 pages
Parallel Computing Essentials
No ratings yet
Parallel Computing Essentials
52 pages
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
No ratings yet
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
18 pages
Common PDC Module3
No ratings yet
Common PDC Module3
43 pages
Bert 1 Parallel Algorithmic Concepts
No ratings yet
Bert 1 Parallel Algorithmic Concepts
95 pages
AA Part1
No ratings yet
AA Part1
43 pages
03-Task Decomposition and Mapping
No ratings yet
03-Task Decomposition and Mapping
62 pages
Introduction To Parallel Computing Design and Anal
No ratings yet
Introduction To Parallel Computing Design and Anal
53 pages
Module - 3 Parallel Algorithm Design - Preliminaries
No ratings yet
Module - 3 Parallel Algorithm Design - Preliminaries
12 pages
Parallel Computing Unit 3 - Principles of Parallel Computing Design
No ratings yet
Parallel Computing Unit 3 - Principles of Parallel Computing Design
78 pages
Chap3 Slides Week4
No ratings yet
Chap3 Slides Week4
42 pages
PDC (Steps in Parallel Algorithm Design)
No ratings yet
PDC (Steps in Parallel Algorithm Design)
82 pages
Padp Unit 4up
No ratings yet
Padp Unit 4up
147 pages
Unit - 2 HPC
No ratings yet
Unit - 2 HPC
96 pages
3.1.3 Processes and Mapping (1/5)
No ratings yet
3.1.3 Processes and Mapping (1/5)
74 pages
Con Currency Mapping
No ratings yet
Con Currency Mapping
40 pages
Pda 4
No ratings yet
Pda 4
82 pages
Module 3 - Principles of Parallel Algorithm Design
No ratings yet
Module 3 - Principles of Parallel Algorithm Design
39 pages
5-Parallel Algorithm Design Life Cycle
No ratings yet
5-Parallel Algorithm Design Life Cycle
25 pages
Partitioning
No ratings yet
Partitioning
37 pages
Processes and Mapping, Decomposition Techniques
No ratings yet
Processes and Mapping, Decomposition Techniques
28 pages
Lecture4 PDF
No ratings yet
Lecture4 PDF
23 pages
HPC Ut 2
No ratings yet
HPC Ut 2
4 pages
Parallel Algorithms Presentation
No ratings yet
Parallel Algorithms Presentation
32 pages
PDC Unit-2
No ratings yet
PDC Unit-2
48 pages
L19-20 PA Design Intro
No ratings yet
L19-20 PA Design Intro
31 pages
Lectures5 14
No ratings yet
Lectures5 14
85 pages
In3200 Chap05
No ratings yet
In3200 Chap05
34 pages
CS416 - Parallel and Distributed Computing: Lecture # 6 (19-03-2021) Spring 2021 FAST - NUCES, Faisalabad Campus
No ratings yet
CS416 - Parallel and Distributed Computing: Lecture # 6 (19-03-2021) Spring 2021 FAST - NUCES, Faisalabad Campus
31 pages
E - Notes - HPC-Unit 3-1
No ratings yet
E - Notes - HPC-Unit 3-1
26 pages
04 Progbasics
No ratings yet
04 Progbasics
43 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
Parallel Algorithms Lecture Notes
No ratings yet
Parallel Algorithms Lecture Notes
37 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
002 IntroHPC
No ratings yet
002 IntroHPC
33 pages
04 Progbasics
No ratings yet
04 Progbasics
62 pages
ET - W2021 (2131905) (GTURanker - Com)
No ratings yet
ET - W2021 (2131905) (GTURanker - Com)
2 pages
Improvise Academy: Subject: Physics Class: XII Full Marks: 75
No ratings yet
Improvise Academy: Subject: Physics Class: XII Full Marks: 75
2 pages
Strong Swan Documentation (Updated Till Eap-Md5)
No ratings yet
Strong Swan Documentation (Updated Till Eap-Md5)
58 pages
Nimbus VTOL Manual 180306
100% (1)
Nimbus VTOL Manual 180306
11 pages
2 T24Updates
No ratings yet
2 T24Updates
24 pages
On The Residual Strength of Rocks and Rockmasses
No ratings yet
On The Residual Strength of Rocks and Rockmasses
13 pages
Fire Protection System
No ratings yet
Fire Protection System
60 pages
Carbon Black Surface Area Analysis
No ratings yet
Carbon Black Surface Area Analysis
39 pages
Database Security Essentials
100% (1)
Database Security Essentials
53 pages
74 SENR1128-System Overview
No ratings yet
74 SENR1128-System Overview
21 pages
Types of Designs 2.1 The Design Can Be Classified in Many Ways. On The Basis of Knowledge, Skill and
No ratings yet
Types of Designs 2.1 The Design Can Be Classified in Many Ways. On The Basis of Knowledge, Skill and
5 pages
Assignment 1spring25
No ratings yet
Assignment 1spring25
3 pages
Chemistry Recap for Class XII Students
No ratings yet
Chemistry Recap for Class XII Students
1 page
Microsoft Excel MCQs
No ratings yet
Microsoft Excel MCQs
15 pages
Activity 1.6
No ratings yet
Activity 1.6
3 pages
Scientific Aspects of Juggling by Claude Shannon
No ratings yet
Scientific Aspects of Juggling by Claude Shannon
11 pages
Math - Exercise of Pat
No ratings yet
Math - Exercise of Pat
5 pages
Infoblox Datasheet - Trinzic 800, 1400, 2200 and 4000 Series Specifications Details PDF
No ratings yet
Infoblox Datasheet - Trinzic 800, 1400, 2200 and 4000 Series Specifications Details PDF
6 pages
dataVAR LAAR
No ratings yet
dataVAR LAAR
1 page
Problems and Solutions - C4
83% (6)
Problems and Solutions - C4
25 pages
Solving Recurrences in Discrete Math
No ratings yet
Solving Recurrences in Discrete Math
8 pages
PLC Components & Functions Guide
No ratings yet
PLC Components & Functions Guide
2 pages
Recon Smart Card Standard
No ratings yet
Recon Smart Card Standard
2 pages
RCC Structure by PANDI MANI
No ratings yet
RCC Structure by PANDI MANI
13 pages
Hydraulic Jack Chap 1
No ratings yet
Hydraulic Jack Chap 1
14 pages
(Business Statistics) Chapter 3 Part 1
No ratings yet
(Business Statistics) Chapter 3 Part 1
30 pages
An Introduction To Role Provisioning and De-Provisioning in Oracle Fusion HCM Cloud Application
No ratings yet
An Introduction To Role Provisioning and De-Provisioning in Oracle Fusion HCM Cloud Application
6 pages
Pinto - pm2 - Session 4 - Shared Slides
No ratings yet
Pinto - pm2 - Session 4 - Shared Slides
78 pages
DS 2CD2T23G0 I520180404aawrc12389314 - 20221006123632
No ratings yet
DS 2CD2T23G0 I520180404aawrc12389314 - 20221006123632
26 pages
AJAX for Web Developers
No ratings yet
AJAX for Web Developers
31 pages

LECTURE 4 - Parallel Computing Design (PART 1)

Uploaded by

LECTURE 4 - Parallel Computing Design (PART 1)

Uploaded by

CSC580

PREPARED BY: SALIZA RAMLY

SALIZA RAMLY - CSC580

o This chapter methodically discuss the process of designing and implementing

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

Task: create sets of elements that satisfy a (or several) criteria.

An alternate decomposition of the given problem into

Fine-grained • Decomposition into a large

Coarse grained • Decomposition into a small Each task in this example

SALIZA RAMLY - CSC580

maximum degree of concurrency average degree of concurrency

average number of tasks that

Degree of Concurrency vs. Task Granularity : Inverse relation

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

Task Dependency graphs

Graph (a): Graph (b):

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

For example, even in the trivial

The graph of tasks(nodes) and their interactions/data exchange (edges) is referred to as a

• The computation of each element of the result vector is a task.

If the granularity of a decomposition is finer, the associated

SALIZA RAMLY - CSC580

Viewing node 0 as an independent task involves a useful computation of one time

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

Task dependency graphs Task interaction graphs

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

recursive data exploratory speculative

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

If the database of transactions is If the database is partitioned across

each task first computes partial

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

In the case of INPUT data In the case of OUTPUT data

SALIZA RAMLY - CSC580

SALIZA RAMLY - CSC580

You might also like