SPPU - BE - HPC - Unit 1 Notes

High Performance Computing Unit 1 Notes - Slides with Explanation

Uploaded by

vrjorwekar

We take content rights seriously. If you suspect this is your content, claim it here.

67% found this document useful (3 votes)

803 views47 pages

SPPU - BE - HPC - Unit 1 Notes

High Performance Computing Unit 1 Notes - Slides with Explanation

Uploaded by

vrjorwekar

We take content rights seriously. If you suspect this is your content, claim it here.

You are on page 1/ 47

High Performance Computing (410250) f. Vaishali JorwekarWhat comes in your mind when you see these 3 pictures of computers. of. Vaishali JorwekarPersonal Laptop Gaming Laptop Super Computer First laptop on the left is personal laptop, which we use for our day to day use. Gaming laptop has higher configuration and graphics card for high definition games. ‘Super computers used by scientist and big companies for complex mathematical modeling. So the main differentiating factor is computing power. computing power refers to how fast and capable a computer is in performing tasks and calculations. of. Vaishali JorwekarPersonal Laptop AMD Ryzen 5 7530U processor [12 Threads | Speed 116 MB L3 Cache upto IMemory: Miz, duat-channet capable upgradable upto 40GB | Storage: 512GB SSD M2 Gaming Laptop Processor: 13th Gen Intel Core |19-13980HX Processor 2.2 Gh (Gem Cache, up to 24 Memory: 16GB (8GB SO-DIMM *2) DDRS 4800 Miz Support {Upto 3260 2¢50-DINM slots ‘Storage: 178 PCIe 4.0 NVMe M2ssD Super Computer Peak Performance: 200 Pops Number of Nodes! 4508 emery pee Node: 512 GB DORE + 96 G3 HAN2 1250 PB IBM Spectum Scale GPFS 2.5 TBis Power Consumption: 13 Operating Syston: ‘Rod Ha Enterprise Linuk (RHEL) version 7.4 I have sample specifications here, and | want to highlight differences in key computing here. As the computation need increases, processors requirements also increases, which is met through, increasing number of processors and cores, cycle frequency. Typical personal computers will have 6 cores up to 16GB RAM, whereas Gaming laptop will have igh number of cores and RAM. But if you compare super computer, you will see number of processors, number is in thousands. And RAM in petabytes. of. Vaishali JorwekarHigh Performance Computing High Performance Computing (HPC) refers to the use of powerful computers and parallel processing techniques to solve complex problems or perform tasks at a much faster rate than traditional computers. High Performance Computing (HPC) refers to the use of powerful computers and parallel processing techniques to solve complex problems or perform tasks at a much faster rate than traditional computers. We saw in last slide, what makes computers powerful, that is number of processors, its cores, frequency and RAM. In first chapter we will see details about parallel processing techniques. of. Vaishali JorwekarApplication of High Performance Computing ‘+ Financial institutions ~ Transactions and card fault detection + Bio-sciences and the human genome — Drug discovery, disease detection / prevention + Computer aided engineering - automotive design and testing, transportation commerce, structural outlook, mechanical design + Chemical engineering -process and molecular design next line ‘+ Digital content creation and distribution-computer aided graphics in film and media * Economics / financial-Wall Street risk analysis, portfolio management, automated trading + Electronic design and automation- electronic component design + Geo sciences and geo engineering - oil and gas exploration and reservoir modelling ‘+ Mechanical design and drafting-2D and 3D design and verification, mechanical modelling + Defense and energy-nuclear stewardship, basic and applied research + Government labs, universities/academic-basic and applied research + Meteorological departments-weather forecasting Lets look at some of the application of high performance computing. of. Vaishali JorwekarParallel Processing A parallel computer is a set of processors that are able to work cooperatively to solve a computational problem. Parallel computing is a form of computation in which many instructions are carried out simultaneously operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently (in parallel) Here's a simplified example to help illustrate the concept of Parallel Processing. Imagine you have a really challenging puzzle to solve, and you want to do it as quickly as. possible. If you try to solve it alone, it might take a long time. However, if you have a group of friends working together simultaneously, each focusing on a different part of the puzzle, you can finish much faster. In the context of computing, traditional computers are like individuals trying to solve the puzzle on their own. High Performance Computing, on the other hand, is like having a team of super-fast computers working together to tackle a complex problem. of. Vaishali JorwekarSerial Processing Parallel Processing — | ~ WMT wu I! | |- Wut | ‘Tobe run ona single computer having a single CPU: Tobe run using multiple CPUs: + Aproblemis broken into a discrete series of instructions + Aproblemis broken into discrete parts that can be solved + Instructions are executed one after another ‘concurrently + Only one instruction may execute at any moment in time. + Each partis further broken down to a series of instructions + Instructions from each part execute simultaneously on different CPUs In serial programming, problem is broken into series of instructions. Just recall your C programs, and each line will have some instructions. This instructions will run one after another. And only one instruction will get executed at any moment. It is nothing but serial programming. First thing for parallel processing, we will need multiple CPUs. So Problem is broken into part that can be solved parallelly. Each part is further broken down to a series of instructions. Instructions from each part execute simultaneously on different CPUs. of. Vaishali JorwekarSerial Processing Parallel Processing Instructions | wil |B Count from 1 to 1000. re Using @ regular computer, you would start at 1 and — With HPC, you could divide the task among multiple processors. incrementally count each number one by one until you reach For instance, if you have 10 processors, each processor could be 1000. This process might take some time, but it's manageable responsible for counting a range of 100 numbers. So, Processor for a personal computer. -Lcounts from 1 to 100, Processor 2 from 101 to 200, and so on. AAU processors work simultaneously, and the entire task of ‘counting from 1 to 1000 is completed much faster compared to ‘personal computer. Lets understand this concept with very simple example. Suppose you want to count from 1 to 1000. Using a regular computer, you would start at 1 and incrementally count each number one by one until you reach 1000. This process might take some time, but it's manageable for a personal computer. of. Vaishali JorwekarMotivating Parallelism Reasons for Growth: + Advancements in specifying and coordinating complex concurrent tasks. + Portable algorithms facilitating parallel processing. + Specialized execution environments and software development toolkits. Reasons: + Increased Computational Power + Enhanced Memory/Disk Speed + Improved Data Communication In recent years, there has been a big improvement in how computers handle multiple tasks that is parallel processing. This is because we've gotten better at organizing and managing complex tasks happening at once, creating portable algorithms (sets of instructions), using special environments for executing tasks, and developing toolkits for making software. This progress is based on three main reasons: Lincreased Computational Power: Modern computers, equipped with CMOS chip-based processors and advanced networking, have become significantly more powerful. This has driven the development of applications capable of handling multiple tasks simultaneously. 2.Enhanced Memory/Disk Speed: Progress in hardware interfaces has expedited the transi from microprocessor creation to the development of entire machines that efficiently execute parallel tasks. 3.Improved Data Communication: Standardization of programming environments has seen notable advancements. This ensures that applications designed for parallel processing remain relevant and useful for an extended period. n of. Vaishali JorwekarModern Processor f. Vaishali JorwekarStored- program computer architecture Stored program computer architecture is a design where instructions and data are stored in the same ‘memory, allowing a central processing unit to sequentially fetch, decode, and execute instructions, enabling versatile programmability. Central Processing Unit Input Device ouput Device "Stored-program computer architecture is like having a recipe book for your computer. In this analogy, the recipe book Is the computer's memory, and the chef Is the central processing unit (CPU). Let's break it down: Memory Unit: Just like a recipe book contains both instructions and a list of ingredients, the computer's memory stores both program instructions and data. Chef (CPU): The CPU acts like a chef following the instructions in the recipe book. It fetches each step, processes it, and moves on to the next one. Fetching and Execution: Imagine the CPU as a chef turning the pages of the recipe book (fetching), reading the instructions (decoding), and then cooking accordingly (execution). Example is personal computer of. Vaishali JorwekarGeneral-purpose Cache-based Microprocessor architecture General-purpose Cache-based Microprocessor architecture is a design incorporating a cache memory hierarchy to enhance data access speed and overall performance in executing a wide range of computational tasks. FE] cote ener cpu Pamary Memory Secondary Menor] Word Transfer Pres me cru Cache Main Memory Fast Slow Again lets understand with same analogy of chef and kitchen. Microprocessor (Chef): The microprocessor is like the chef, responsible for executing instructions and processing data. Cache (Countertop): Now, think of the cache as the countertop near the chef. This is where the chef keeps ingredients they use frequently. Main Memory (Pantry): The main memory is like the pantry, storing a larger quantity of ingredients. However, it takes a bit more time for the chef to go to the pantry to get less frequently used ingredients. Fetching Ingredients (Data): When the chef needs an ingredient (data), here's what happens: First, the chef checks the countertop (cache) for commonly used ingredients. If the ingredient is on the countertop (in the cache), great! It's quickly accessed. If not, the chef goes to the pantry (main memory) to retrieve the ingredient. Everyday Products: Smartphones and Laptops: Just like a chef needs quick access to ingredients, your smartphone and laptop use cache memory to store frequently accessed data and instructions for faster processing. of. Vaishali JorwekarWeb Browsing: When you load a webpage, the browser uses cache memory to store elements of the page for quicker retrieval. It's like having the ingredients ready for the chef without going to the pantry every time. of. Vaishali JorwekarParallel Programming Platforms of. Vaishali JorwekarExplicit Parallelism Implicit Parallelism + Programmer specifically defines and instructs + Automatically identifies and executes tasks the system on parallel tasks. concurrently without explicit instructions from + Programmer actively Incorporate parallel the programmer. constructs or directives into the code. + Programmer write regular, step-by-step code + System follows the programmer's explicit without specific parallel constructs. instructions for parallel execution. + Compiler, runtime system, and hardware work together to find and exploit parallel opportunities. Implicit parallelism is like type of parallelism in computing that automatically handles multiple tasks at the same time without you needing to explicitly tell it to. It means you can write your programs in a regular, step-by-step way, and behind the scenes, the computer's compiler and hardware work together to find opportunities to speed things up by doing tasks simultaneously. So, as an engineer, you focus on your code's logic, and the system takes care of making it run faster using parallel processing, all without you having to add any special parallel instructions. of. Vaishali JorwekarImplicit Parallelism - Pipelining Execution Pipelining in High-Performance Computing + Maximizing Processor Utilization © Utilize ALU, buses, registers, etc., continuously. + Pipelining Concept I © Instructions flow through the processor like a =< pipe. © Move through stages to accomplish operations. + Continuous Processor Usage © Each unit handles an instruction, keeping the processor busy. Imagine your processor is like a well-designed assembly line. Each part of the processor—like the ALU, buses, and registers—has a specific job. The goal? Keep all these parts busy all the time. So, what's pipelining? It's like turning your processor into a pipe. Instructions flow through it, moving from one stage to the next to get the job done. This way, each part of the processor is always working on something. No downtime. In simpler terms, it's like a well-oiled machine where instructions smoothly move through different stages, making sure your processor is always doing something useful." of. Vaishali JorwekarOverlapping Execution with Pipelining coffe Implicit Parallelism - Pipelining Execution + Non-Pipelined Approach © Fetch, decode, read, execute, and write sequentially © Hardware idle during waiting periods. + Pipelining Technique © Overlap execution of several instructions. © Two-stage pipelining example: Fetch and Execute. + Benefits © Faster execution by fetching next instruction during the current one’s execution, © Allunits busy, preventing idle time "Now, let's explore why pipelining is good and how it improves the efficiency of our processors. In the past, processors followed a step-by-step approach—fetch, decode, read, execute, and write, one after another. The drawback? Many components of the hardware would remain inactive, patiently waiting for others to complete their tasks. In pipelining approach, It's like managing multiple instructions simultaneously. Picture this: accomplishing two tasks in just two stages—fetching the next instruction while executing the current one, It's an intelligent method to overlap tasks and maintain a smooth workflow. What's the result? Quicker execution! Every part of the processor remains engaged, avoiding any downtime. Think of it as orchestrating a production line where everyone has a role, and the line keeps moving without interruptions. of. Vaishali Jorwekar+ From Scalar to Superscalar © Scalar processors had one pipelined unit for {teger and one for floating-point operations. for Parallelism © Single pipeline isn’t enough for parallelism. © Pipelines enable parallelism by having multiple instructions at different stages. © Superscalar processors execute more than one instruction per clock cycle. © Fetch and decode multiple instructions simultaneously, ¢ Implicit Parallelism - Superscalar Execution So, back in the day, processors were scalar, meaning they had one pipeline for integer operations and one for floating-point operations. But designers realized that having just one pipeline wasn't cutting it for getting things done faster. We needed more parallelism. that's where superscalar came into picture. It's like having a processor that can do more than one thing at a time during a single clock cycle. Imagine fetching and decoding multiple instructions simultaneously. That's the essence of superscalar - making our processors more efficient by doing multiple tasks at once." of. Vaishali Jorwekar+ Instruction Level Parallelism (ILP) ‘© Superscalar architecture exploits Instruction Level Parallelism (ILP) ‘© Multiple pipelines for various instructions (eg, integer and floating-point) + Complexity Considerations ‘© Superscalar scheduler complexity and hardware cost are crucial in processor design. + VLIW Solution © Very Long Instruction Word (VLIW) processors use compile-time analysis. ‘© Bundling instructions for concurrent execution, addressing complexity. Implicit Parallelism - Superscalar Execution Integer regstor fle Floating-point ogister fe ir [on Pipelined integer functional units Pipelined floating point functional units Lets start with Instruction Level Parallelism (ILP). We've got multiple pipelines for different instructions like arithmetic, load, and store. It's about taking advantage of parallelism to speed things up. Now, here's the catch - making a superscalar processor is not easy. It's complex, and the hardware cost is something we really need to think about in processor design. To tackle this, we have something called VLIW or Very Long Instruction Word processors. They use a clever trick at compile time to identify and bundle together instructions that can be done at the same time. It's like putting a bunch of instructions in a very long instruction word to simplify the process. of. Vaishali JorwekarImplicit Parallelism - VLIW Processor Structure + Need for Separate Units © To perform multiple operations in one execution stage, separate units for each operation are essential. + VLIW Architecture © Visual representation of separate units for operations (Floating Point Add, Multiply, Branching, Integer ALU). © VLIW (Very Long Instruction Word) executes more than one basic instruction at a time. ‘© Multiple operations stored in a single instruction word, When we want to do multiple things at once in a single execution stage, we need separate units for ‘each operation. Picture this: for floating point addition, multiplication, branching, and integer ALU, we've got dedicated units. Check out Fig. 1.4.3 for a visual on this. Now, VLIW stands for Very Long Instruction Word. It's a way for our processors to handle more than one basic instruction at a time. How? By storing multiple operations in a single instruction word. So, when we issue one instruction, multiple operations kick off simultaneously during the execution cycle of the pipelining process. Simple, right?" of. Vaishali JorwekarImplicit Parallelism - VLIW Processor Structure Execution and Compiler Role + Simultaneous Operations © VLIW executes multiple operations simultaneously with one instruction. + Compiler's Role © Compiler identifies parallelism, schedules dependency-free code. © Resolves dependencies among instructions at ‘compile time. + Characteristics © Multiple independent operations in a VLIW instruction, no flow dependences. So, VLIW does multiple operations all at once with one instruction—no waiting around. But here's the trick: the compiler is crucial. It spots where we can run things in parallel and arranges the code to avoid any dependencies. So, the compiler is very important, making sure everything plays in harmony, It identifies and schedules operations that can run side by side, resolving any issues before the program even runs. One more thing - in a VLIW instruction, all these operations are independent; they don't rely on each other. It's like having a set of tasks that can be done simultaneously without any fuss. of. Vaishali JorwekarDichotomy of Parallel Computing Platforms Division based on logical and physical organization of parallel platforms Physical organization is the actual hardware organization of a platform. logical organization refers to a programmer's view of the platform. Control Structure The Communication Model + The various ways of expressing parallel + The mechanisms for specifying tasks is known as control structure. interaction between the parallel tasks is called as communication model. There are several platforms which facilitates parallel computing. In this section the division based on logical and physical organization of parallel platforms will be discussed. Physical organization is the actual hardware organization of a platform. logical organization refers to a programmer's view of the platform. From programmers perspective the two important components of parallel computing are: Control Structure and The Communication Model. of. Vaishali JorwekarPhysical Organization of Parallel Platforms Evolution Lets start with at conventional architecture, representing the traditional uni-processor system. While some parallel features can improve a single processor's speed, there are limitations. The foundation of processor architecture traces back to the Von Neumann Computer, characterized by its CPU, Memory, and I/O devices. This system follows the Von Neumann architecture, where the CPU consists of Arithmetic and Control units, operating on the stored program concept. Both program and data share the same memory unit, each location having a unique address. Execution proceeds sequentially unless the program explicitly alters this flow. Fig. 1.8.2 marks the initial steps toward parallelism, introducing lookahead, overlapping fetch and execute, and parallelism in functions. This latter concept involves two mechanisms: pipelining and multiple functional units. In the second mechanism, various functional units operate simultaneously, enhancing processing speed. Vector instructions, akin to massive arrays of data with a common operation, were initially managed by pipeline processors controlled by software looping. Subsequently, explicit processors tailored for vector instructions emerged. Two variations in vector processing include memory-to-memory and register-to-register, with the former utilizing of. Vaishali Jorwekarmemory for operand storage and the latter using registers. The evolution of register-to-register architecture led to the creation of two processor types: Single Instruction Multiple Data (SIMD) and Multiple Instruction Multiple Data (MIMD). These developments signify the gradual integration of parallelism in processors, contributing to enhanced processing capabilities. of. Vaishali JorwekarPhysical Organization of Parallel Platforms Parallel Random Access Machine (PRAM) Various PRAM models differ in how they handle read or write conflicts + EREW : Exclusive Read Exclusive Write p processors can simultaneously read and write the content of p distinct memory locations. + CREW: Concurrent Read Exclusive Write p processors can simultaneously read the content of p! memory locations, where p'

23PCSC10 Data Science and Analytics
No ratings yet
23PCSC10 Data Science and Analytics
118 pages
R-22 Data Visualization - R Programming Power Bi Lab Record
No ratings yet
R-22 Data Visualization - R Programming Power Bi Lab Record
36 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
57 pages
Problem Solving Techniques
No ratings yet
Problem Solving Techniques
52 pages
Cloud Computing BCA
No ratings yet
Cloud Computing BCA
10 pages
GE8072 - Foundation Skills in Integrated Product Development (Ripped From Amazon Kindle Ebooks by Sai Seena)
No ratings yet
GE8072 - Foundation Skills in Integrated Product Development (Ripped From Amazon Kindle Ebooks by Sai Seena)
140 pages
News Aggregator Project
No ratings yet
News Aggregator Project
45 pages
Machine Learning Quantum
No ratings yet
Machine Learning Quantum
64 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
157 pages
Arduino - Architecture, Programming and Application
No ratings yet
Arduino - Architecture, Programming and Application
64 pages
IT Workshop Lab Set Wise Questions
No ratings yet
IT Workshop Lab Set Wise Questions
2 pages
Bcs515b Notes Dr. Sbl-1
No ratings yet
Bcs515b Notes Dr. Sbl-1
69 pages
IT Storage & Retrieval Lab Manual
No ratings yet
IT Storage & Retrieval Lab Manual
57 pages
Presentations PPT Unit-1 27042019073920AM
100% (1)
Presentations PPT Unit-1 27042019073920AM
42 pages
Siddharth Arya 76 ML Practical File
No ratings yet
Siddharth Arya 76 ML Practical File
30 pages
UNIT5 - Comparison Tree
No ratings yet
UNIT5 - Comparison Tree
52 pages
@vtucode - in 21CS641 Module 1 PDF 2021 Scheme
No ratings yet
@vtucode - in 21CS641 Module 1 PDF 2021 Scheme
17 pages
Data Science - UNIT-3 - Notes
No ratings yet
Data Science - UNIT-3 - Notes
32 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
Dunham - Data Mining PDF
83% (6)
Dunham - Data Mining PDF
156 pages
VTU Exam Question Paper With Solution of BCS302 Digital Design and Computer Organization April-2024-Dr - Ciyamala Kushbu S
No ratings yet
VTU Exam Question Paper With Solution of BCS302 Digital Design and Computer Organization April-2024-Dr - Ciyamala Kushbu S
4 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
Introduction To PHP Evolution and Its Comparison
100% (1)
Introduction To PHP Evolution and Its Comparison
10 pages
UNIT 1 (Improving Software Economics) PDF
No ratings yet
UNIT 1 (Improving Software Economics) PDF
20 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
Python Data Analytics Guide
No ratings yet
Python Data Analytics Guide
99 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
IDS Unit 1 Notes
No ratings yet
IDS Unit 1 Notes
24 pages
MCS101-Artificial Intelligence
100% (1)
MCS101-Artificial Intelligence
3 pages
18CS42 Model Question Paper-1 With Effect From 2019-20 (CBCS Scheme) Usn: Fourth Semester B.E. Degree Examination Design and Analysis of Algorithms
No ratings yet
18CS42 Model Question Paper-1 With Effect From 2019-20 (CBCS Scheme) Usn: Fourth Semester B.E. Degree Examination Design and Analysis of Algorithms
3 pages
Case Study (Analysis of Algorithm
No ratings yet
Case Study (Analysis of Algorithm
14 pages
Oops With Java Quantum
No ratings yet
Oops With Java Quantum
66 pages
Big Data Lab Manual for IT Students
100% (1)
Big Data Lab Manual for IT Students
45 pages
Dominators Global Data Flow Analysis
No ratings yet
Dominators Global Data Flow Analysis
30 pages
Important Questions - AI - 2 Marks
No ratings yet
Important Questions - AI - 2 Marks
4 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
36 pages
R22 - IT - Python Programming Lab Manual
No ratings yet
R22 - IT - Python Programming Lab Manual
96 pages
Ai - Unit Ii
No ratings yet
Ai - Unit Ii
126 pages
Toc Mod 5 Notes
No ratings yet
Toc Mod 5 Notes
41 pages
Session 02
No ratings yet
Session 02
16 pages
Midlet Lifecycle
No ratings yet
Midlet Lifecycle
2 pages
Protection Mechanism (Protection domain+ACL)
No ratings yet
Protection Mechanism (Protection domain+ACL)
2 pages
BCA II Year Java Practical Solution
No ratings yet
BCA II Year Java Practical Solution
20 pages
Characteristics of A Good SRS
No ratings yet
Characteristics of A Good SRS
2 pages
Devops & Agile Programming Unit-2
No ratings yet
Devops & Agile Programming Unit-2
40 pages
Algorithms Lab Manual 2024-2025
No ratings yet
Algorithms Lab Manual 2024-2025
60 pages
InfyTQ DBMS Lecture Session 1
No ratings yet
InfyTQ DBMS Lecture Session 1
35 pages
Mini ProjectA17
0% (1)
Mini ProjectA17
25 pages
OS-I Unit
No ratings yet
OS-I Unit
32 pages
BI Lab Manual
0% (1)
BI Lab Manual
9 pages
Ccs374 Web Application Security
No ratings yet
Ccs374 Web Application Security
20 pages
Security and Privacy For Big Data Analytics
No ratings yet
Security and Privacy For Big Data Analytics
5 pages
Software Engineering Lab Manual
No ratings yet
Software Engineering Lab Manual
35 pages
HCI Course Overview for CSE Students
No ratings yet
HCI Course Overview for CSE Students
49 pages
Classical Problems of Synchronization
No ratings yet
Classical Problems of Synchronization
18 pages
Scalable Parallel Computing
No ratings yet
Scalable Parallel Computing
11 pages
RDBMS Unit 5
No ratings yet
RDBMS Unit 5
39 pages
IT8601-Computational Intelligence PDF
No ratings yet
IT8601-Computational Intelligence PDF
12 pages
Challenges InThreading A Loop - Doc1
100% (2)
Challenges InThreading A Loop - Doc1
6 pages
Java - Lab - Manual-21csl35 - Skit
No ratings yet
Java - Lab - Manual-21csl35 - Skit
30 pages

SPPU - BE - HPC - Unit 1 Notes

Uploaded by

SPPU - BE - HPC - Unit 1 Notes

Uploaded by

You might also like