0% found this document useful (0 votes)

38 views70 pages

Advanced Python and HPC Optimization

Uploaded by

ADEDE Ezéchiel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views70 pages

Advanced Python and HPC Optimization

Uploaded by

ADEDE Ezéchiel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

ADVANCED PYTHON

Imad Kissami1
1 Mohammed VI Polytechnic University, Benguerir, Morocco

October 12, 2023

OUTLINE

• Why HPC?
• What’s Supercomputer?
• Data locality
• How to make Python Faster
OUTLINE

• The flood of Data

• Big data problem
• What’s HPC ?
• Typical HPC workloads
• Data Analytics Process
THE FLOOD OF DATA

In 2021
• Internet user ∼ 1.9 GB per day
• Self driving car ∼ 4 TB per day
• Connected airplane ∼ 5TB per day
• Smart factory ∼ 1 PB per day
• Cloud video providers ∼ 750 PB per day
THE FLOOD OF DATA

A self-driving car
• Radar ∼ 10 − 100 KB per second
• Sonar ∼ 10 − 100 KB per second
• GPS ∼ 50 KB per second
• Lidar ∼ 10 − 70 MB per second
• Cameras ∼ 20 − 40 MB per second
• 1 car ∼ 5 Exaflops per hour
BIG DATA PROBLEM

Too much data Not enough computer

power, storage or
infrastructure
WHAT'S HPC?
Leveraging distributed compute resources to solve complex
problems
• Terabytes −→ Petabytes −→ Zetabytes of data
• Results in minutes to hours instead of days or weeks
TYPICAL HPC WORKLOADS

* Source:
https://www.xilinx.com/applications/data-center/high-performance-computing.html
DATA ANALYTICS PROCESS
Inspecting, cleaning, transforming and modeling à
decision-making.
SUMMARY

• Larger datasets require distributed computing

• Several open source HPC frameworks available
OUTLINE

• A brief introduction on hardware

• Modern supercomputers
A BRIEF INTRODUCTION ON HARDWARE
Modern architecture (CPU)
A BRIEF INTRODUCTION ON HARDWARE

Moore's Law
• Number of
transistors: from
37.5 million(2000) to
fifty billion(2022)
• Cpu speed: from
1.3GHz to 3.4GHz
A BRIEF INTRODUCTION ON HARDWARE
CPU vs RAM speeds
A BRIEF INTRODUCTION ON HARDWARE
Common Processors

Processor Launched Nb. of Cores Freq. (Ghz)

Xeon Platinum 9282 (formerly Cascade Lake) 2019-Q2 28 2.6-3.8
Xeon Platinum 8376H (formerly Cooper Lake) 2019-Q2 28 2.6-4.3
i9-12900H (Mobile, 12th generation) 2022-Q1 4-16 3.8-5.0
i9-12900KS ( Desktop, formerly Alder Lake ) 2022-Q1 8-16 2.5-5.5

Table: Some Intel processors

Processor L3 cache Nb. of Cores Freq. (Ghz)

AMD EPYC 7773X 768 MB 64 2.2-3.5
AMD EPYC 7763 256 MB 64 2.45-3.5
AMD Ryzen 9 5950X (Desktop) 72 MB 16-32 3.4-4.9
AMD Ryzen 9 3900X (Desktop) 70 MB 12-24 3.4-4.6

Table: Some AMD processors

MODERN SUPERCOMPUTERS

What is a supercomputer?
• cdc 6600: 1964 - three million calculations per second
• Summit: 2018 - 36000 processors - 200 quadrillion
calculations per second
• Frontier: 2022 - 8 million processors - AMD EPYC with 64 cores
and speed up to 2GHz - quintillion calculations per second
Toubkal: 2021 - 69000 processors
MODERN SUPERCOMPUTERS
What is a supercomputer?
Frontier (USA)
MODERN SUPERCOMPUTERS
What is a supercomputer?
Cluster
Processor
Chip Node

Shared memory Shared memory Shared memory

Network
MODERN SUPERCOMPUTERS

Top 500
• Cray 2: Gigascale
milestone in 1985
• Intel ASCI Red
System: Terascale in
1997
• IBM Roadrunner
System: Petascale in
2008
• Frontier: Exascale in
2022
Modern supercomputers

Top 500 Family system share evolution November 2009

1 https://www.top500.org/statistics/list/
Modern supercomputers

Top 500 Family system share evolution November 2011

1 https://www.top500.org/statistics/list/
Modern supercomputers

Top 500 Family system share evolution November 2015

1 https://www.top500.org/statistics/list/
Modern supercomputers

Top 500 Family system share evolution November 2017

1 https://www.top500.org/statistics/list/
Modern supercomputers

Top 500 Family system share evolution

June 2022

1 https://www.top500.org/statistics/list/
SUMMARY

• Highlights
• New architectures are available
• Supercomputers achieve Exascale
• Consequence for the developers
• Writing dedicated codes
OUTLINE

• Some definitions
• FLOPS
• Frequency
• Memory Bandwidth
• Memory Latency
• Computational Intensity
• Two level memory model
SOME DEFINITIONS

FLOPS
Floating point operations per second (FLOPS or flop/second).
SOME DEFINITIONS

Frequency
Speed at which a processor or other component operates (Hz)
SOME DEFINITIONS

Memory Bandwidth
Rate at which data can be transferred between the CPU and the memory (bytes/second).
SOME DEFINITIONS

Memory Latency
Time delay between a processor requesting data from memory and the moment that
the data is available for use (clock cycles or time units).
COMPUTATIONAL INTENSITY

Algorithms have two costs (measured in time or energy):

• Arithmetic (FLOPS)
• Communication: moving data between
- levels of a memory hierarchy (sequential case)
- processors over a network (parallel case)

Computational Intensity
It is the ratio between arithmetic complexity (or cost) and memory complexity (cost).
TWO LEVEL MEMORY MODEL

Modern architecture (CPU)

Typical sizes
• RAM ∼ 4 GB − 128 GB
even higher on servers
• L3 ∼ 4 MB − 50 MB
• L2 ∼ 256 KB − 8 MB
à Holds data that is likely
to be accessed by the
CPU
• L1 ∼ 256 KB
Cache Hit or Miss à Instruction and Data
• Cache Hit: CPU is able to find the Data in L1/L2/L3 cache
• Cache Miss: CPU is not able to find the Data in
L1-L2-L3 and must retrieve it from RAM
MATRIX MULTIPLICATION: THREE NESTED LOOP

1 for i in range(0, n):

2 #read row i of A into fast memory
3 for j in range(0, n):
4 #read row C[i,j] into fast memory
5 #read col j of B into fast memory
6 for k in range(0, n):
7 C[i,j] = C[i,j] + A[i,k]*B[k,j]
8 #write C[i,j] back to slow memory

1 arithmetic cost :: n**3*( ADD + MUL) = 2n**3 arithmetic operations

2 memory cost :: n**3* READ + n**2* READ + n**2*( READ + WRITE) = n**3 + 3n**2
3 computational intensity :: 2n**3/(n**3 + 3n**2 ) ~= 2
SUMMARY

• Running time of an algorithm is sum of 3 terms:

- N_flops * time_per_flop
- N_words / bandwidth
- N_messages * latency
à Avoiding communication algorithms come with a significant speedup
• Some examples
- Up to 12x faster for 2.5D matmul on 64K core IBM BG/P
- Up to 3x faster for tensor contractions on 2K core Cray XE/6
- Up to 6.2x faster for All-Pairs-Shortest-Path on 24K core Cray CE6
OUTLINE

• Data Locality
- The Penalty of Stride
- High Dimensional Arrays
• Block Matrix Multiplication
DATA LOCALITY

• Data locality is key for improving per-core

performance,
• Memory hierarchy has 4 levels,
• Processor looks for needed data in memory
hierarchy,
• Simple or complex manipulations can increase
speedup,
• Blocking version of mxm can increase
computational intensity.
DATA LOCALITY

The Penalty of Stride > 1?

• Data should be arranged for unit stride access,
• Not doing so can result in severe performance penalty

Example:
1 do i=1, N*i_stride ,i_stride
2 mean = mean + a(i)
3 end do

• Compilation with all optimization and vectorization disabled (-O0)

• Compulation with (-O2) that activates some optimizations
DATA LOCALITY

The Penalty of Stride: CPU time

DATA LOCALITY

High Dimensional Arrays

• High Dimensional Arrays are

stored as a contiguous
sequence of elements
à Fortran uses Column-Major
ordering
à C uses Row-Major ordering
mxm in Fortran N = 1000
• Naive version: CPU-time 1660.6
(msec)
• Transpose version: CPU-time
1139.8 (msec)
BLOCK MATRIX MULTIPLICATION

mxm example: Using block version (cache optimization)

1 for ii in range(0, n, nb):
2 for jj in range(0, n, nb):
3 for kk in range(0, n, nb):
4 for i in range(ii, min(ii+nb, n)):
5 for j in range(jj, min(jj+nb, n)):
6 for k in range(kk, min(kk+nb, n)):
7 c[i][j] +=a[i][k] *b[k][j]
BLOCK MATRIX MULTIPLICATION

mxm block version: CPU time & Bandwidth

SUMMARY

• Access contiguous, stride-one

memory addresses
• Emphasize cache reuse
• Use data structures that improve
locality
• Minimize communication across
different memory levels
• Use parallelism to improve locality
OUTLINE

• About Python
• Python is slow !
• Profiling a Python code
ABOUT PYTHON

• Python was created by Guido van Rossum in 1991 (last version 3.11 - 24/10/2022)
• Python is simple
• Python is fully featured
• Python is readable
• Python is extensible
• Python is ubiquitous, portable, and free
• Python has many third party libraries, tools, and a large community
ABOUT PYTHON

When does it matter?

• Is my code fast?
• How many CPUh?
• Problems on the system?
• How much effort is it to make it run faster?
PROFILING A PYTHON CODE: WHY?

• Code bottlenecks
• Premature optimization is the root of all evil D. Knuth
• First make it work. Then make it right. Then make it fast. K. Beck
• How?
PROFILING A PYTHON CODE: PROFILERS

• Deterministic and statistical profiling

- the profiler will be monitoring all the events
- it will sample after time intervals to collect that information
• The level at which resources are measured; module, function or line level
• Profile viewers
PROFILING A PYTHON CODE: TOOLS

• Inbuilt timing modules

• profile and cProfile
• pstats
• line_profiler
• snakeviz
PROFILING A PYTHON CODE: USE CASE

1 def linspace(start, stop, n):

2 step =float(stop -start) / (n -1)
3 return [start +i *step for i in range(n)]
4
5 def mandel(c, maxiter):
6 z =c
7 for n in range(maxiter):
8 if abs(z) >2:
9 return n
10 z = z*z +c
11 return n
12
13 def mandel_set(xmin=-2.0, xmax=0.5, ymin=-1.25, ymax=1.25,
14 width=1000, height=1000, maxiter=80):
15 r = linspace(xmin, xmax, width)
16 i = linspace(ymin, ymax, height)
17 n = [[0]*width for _ in range(height)]
18 for x in range(width):
19 for y in range(height):
20 n[y][x] =mandel(complex(r[x], i[y]), maxiter)
21 return n
PROFILING A PYTHON CODE: TIMEIT

The very naive way

1 import timeit
2
3 start_time =timeit.default_timer()
4 mandel_set()
5 end_time =timeit.default_timer()
6 # Time taken in seconds
7 elapsed_time =end_time -start_time
8
9 print('> Elapsed time', elapsed_time)

or using the magic method timeit

1 [In] %timeit mandel_set()

2 [Out] 3.01 s +/- 84.6 ms per loop (mean +/- std. dev. of 7 runs, 1 loop each)
PROFILING A PYTHON CODE: PRUN

1 [In] %prun -s cumulative mandel_set()

which is, in console mode, equivalent to

1 python -m cProfile -s cumulative mandel.py

1 25214601 function calls in 5.151 seconds

2
3 Ordered by: cumulative time
4
5 ncalls tottime percall cumtime percall filename:lineno(function)
6 1 0.000 0.000 5.151 5.151 {built -in method builtins.exec}
7 1 0.002 0.002 5.151 5.151 <string >:1(< module >)
8 1 0.291 0.291 5.149 5.149 <ipython -input -4 -9421 bc2016cb >:13( mandel_set)
9 1000000 3.461 0.000 4.849 0.000 <ipython -input -4 -9421 bc2016cb >:5( mandel)
10 24214592 1.388 0.000 1.388 0.000 {built -in method builtins.abs}
11 1 0.008 0.008 0.008 0.008 <ipython -input -4 -9421 bc2016cb >:17( < listcomp >)
12 2 0.000 0.000 0.000 0.000 <ipython -input -4 -9421 bc2016cb >:1( linspace)
13 2 0.000 0.000 0.000 0.000 <ipython -input -4 -9421 bc2016cb >:3(< listcomp >)
14 1 0.000 0.000 0.000 0.000 {method 'disable ' of '_lsprof.Profiler ' objects}
PROFILING A PYTHON CODE: LINE LEVEL

• Use the line_profiler package

1 [In] %load_ext line_profiler
2 [In] %lprun -f mandel mandel_set()

1 Timer unit: 1e-06 s

2 Total time: 12.4456 s
3 File: <ipython -input -2 -9421 bc2016cb >
4 Function: mandel at line 5
5 #Line Hits Time Per Hit % Time Line Contents
6 ==============================================================
7 5 def mandel(c, maxiter):
8 6 1000000 250304.0 0.3 1.1 z = c
9 7 24463110 6337732.0 0.3 27.7 for n in range(maxiter):
10 8 24214592 8327289.0 0.3 36.5 if abs(z) > 2:
11 9 751482 201108.0 0.3 0.9 return n
12 10 23463110 7658255.0 0.3 33.5 z = z*z + c
13 11 248518 65444.0 0.3 0.3 return n
PROFILING A PYTHON CODE: LINE LEVEL

This can be done in console mode as well

1 @profile
2 def mandel(c, maxiter):
3 z =c
4 for n in range(maxiter):
5 if abs(z) >2:
6 return n
7 z = z*z +c
8 return n

Then on the command line

1 kernprof -l -v mandel.py

Then
1 python3 -m line_profiler mandel.py.lprof
PROFILING A PYTHON CODE: MEMORY

• Use the memory_profiler package

1 [In] %load_ext memory_profiler
2 [In] %mprun -f mandel mandel_set()

1 Line # Mem usage Increment Occurrences Line Contents

2 =============================================================
3 8 118.2 MiB -39057.7 MiB 1000000 def mandel(c, maxiter):
4 9 118.2 MiB -39175.5 MiB 1000000 z = c
5 10 118.2 MiB -293081.8 MiB 24463110 for n in range(maxiter):
6 11 118.2 MiB -292425.7 MiB 24214592 if abs(z) > 2:
7 12 118.2 MiB -38519.6 MiB 751482 return n
8 13 118.2 MiB -253906.1 MiB 23463110 z = z*z + c
9 14 118.2 MiB -656.4 MiB 248518 return n
PROFILING A PYTHON CODE: MEMORY
• Use the memory_profiler package

1 @profile
2 def mandel(c, maxiter):
3 z =c
4 for n in range(maxiter):
5 if abs(z) >2:
6 return n
7 z = z*z +c
8 return n

Then on the command line

1 mprof run mandel.py

Then
1 mprof plot

Or
1 python3 -m memory_profiler mandel.py
OUTLINE

• Accelerate a Python code

- Using Numpy
- Using Cython
- Using Numba
- Using Pyccel
• Some Benchmarks
ACCELERATE A PYTHON CODE: NUMPY

• Library for scientific computing in Python,

• High-performance multidimensional array object,
• Integrates C, C++, and Fortran codes in Python,
• Uses multithreading.
ACCELERATE A PYTHON CODE: NUMPY VS LISTS
1 import numpy, time
2
3 size =1000000
4
5 print("Concatenation: ")
6 list1 =[i for i in range(size)]; list2 =[i for i in range(size)]
7
8 array1 =numpy.arange(size); array2 =numpy.arange(size)
9
10 # List
11 initialTime =time.time()
12 list1 =list1 +list2
13 # calculating execution time
14 print("Time taken by Lists :", (time.time() -initialTime), "seconds")
15
16 # Numpy array
17 initialTime =time.time()
18 array =numpy.concatenate((array1, array2), axis =0)
19 # calculating execution time
20 print("Time taken by NumPy Arrays :", (time.time() -initialTime), "seconds")

1 Concatenation:
2 Time taken by Lists : 0.021048307418823242 seconds
3 Time taken by NumPy Arrays : 0.009451150894165039 seconds
ACCELERATE A PYTHON CODE: CYTHON

• Cython is an optimizing static compiler for:

• Python programming language
• Cython programming language (based
on Pyrex)
• Cython gives you the combined power of
Python.
ACCELERATE A PYTHON CODE: CYTHON

• Python

1 def mandelbrot(m, size, iterations):

2 for i in range(size):
3 for j in range(size):
4 c = -2 + 3./size*j +1j*(1.5-3./size*i)
5 z =0
6 for n in range(iterations):
7 if np.abs(z) <=10:
8 z = z*z +c; m[i, j] =n
9 else:
10 break
ACCELERATE A PYTHON CODE: CYTHON

• Cython

1 def mandelbrot_cython(int[:,::1] m,int size, int iterations):

2 cdef int i, j, n
3 cdef complex z, c
4 for i in range(size):
5 for j in range(size):
6 c = -2 + 3./size*j +1j*(1.5-3./size*i)
7 z =0
8 for n in range(iterations):
9 if z.real**2 +z.imag**2 <=100:
10 z = z*z +c; m[i, j] =n
11 else:
12 break
ACCELERATE A PYTHON CODE: CYTHON

• Execution time

1 %% timeit -n1 -r1

2 m = np.zeros(s, dtype=np.int32)
3 mandelbrot(m, size , iterations)
4 >> 12.2 s +/- 0 ns per loop (mean +/- std. dev. of 1 run , 1 loop each)
5
6
7 %% timeit -n1 -r1
8 m = np.zeros(s, dtype=np.int32)
9 mandelbrot_cython (m, size , iterations)
10 >> 29.8 ms +/- 0 ns per loop (mean +/- std. dev. of 1 run , 1 loop each)
ACCELERATE A PYTHON CODE: NUMBA
• Open source Just-In-Time compiler for python functions.
• Uses the LLVM library as the compiler backend.
ACCELERATE A PYTHON CODE: NUMBA
• Python
1 import numpy as np
2
3 def do_sum():
4 acc =0.
5 for i in range(10000000) :
6 acc +=np.sqrt(i)
7 return acc

• Numba
1 from numba import njit
2
3 @njit
4 def do_sum_numba():
5 acc =0.
6 for i in range(10000000) :
7 acc +=np.sqrt(i)
8 return acc

1 Time for Pure Python Function: 7.724030017852783

2 Time for Numba Function: 0.015453100204467773
ACCELERATE A PYTHON CODE: PYCCEL

• Pyccel is a static compiler for Python 3, using Fortran or C as a backend language.

• Python function:

1 import numpy as np
2
3 def do_sum_pyccel():
4 acc =0.
5 for i in range(10000000) :
6 acc +=np.sqrt(i)
7 return acc
ACCELERATE A PYTHON CODE: PYCCEL (F90)
• Compilation using fortran:
1 pyccel --language=fortran pyccel_example.py

1 module pyccel_example
2 use , intrinsic :: ISO_C_Binding , only : i64 => C_INT64_T , f64 => C_DOUBLE
3 implicit none
4 contains
5 ! ........................................
6 function do_sum_pyccel () result(acc)
7
8 implicit none
9 real(f64) :: acc
10 integer(i64) :: i
11 acc = 0.0 _f64
12 do i = 0_i64 , 9999999 _i64 , 1_i64
13 acc = acc + sqrt(Real(i, f64))
14 end do
15 return
16 end function do_sum_pyccel
17 ! ........................................
18 end module pyccel_example

1 Time for Pure Python Function: 7.400242328643799

2 Time for Pyccel Function: 0.01545262336730957
ACCELERATE A PYTHON CODE: PYCCEL (C)

• Compilation using c:
1 pyccel --language=c pyccel_example.py

1 # include "pyccel_example.h"
2 # include <stdlib.h>
3 # include <math.h>
4 # include <stdint.h>
5 /* ........................................ */
6 double do_sum_pyccel(void)
7 {
8 int64_t i;
9 double acc;
10 acc = 0.0;
11 for (i = 0; i < 10000000; i += 1)
12 {
13 acc += sqrt (( double)(i));
14 }
15 return acc;
16 }
17 /* ........................................ */
SOME BENCHMARKS

Rosen-Der

Tool Python Cython Numba Pythran Pyccel-gcc Pyccel-intel

Timing (µs) 229.85 2.06 4.73 2.07 0.98 0.64
Speedup − × 111.43 × 48.57 × 110.98 × 232.94 × 353.94

Black-Scholes

Tool Python Cython Numba Pythran Pyccel-gcc Pyccel-intel

Timing (µs) 180.44 309.67 3.0 1.1 1.04 6.56 10−2
Speedup − × 0.58 × 60.06 × 163.8 × 172.35 × 2748.71

Laplace

Tool Python Cython Numba Pythran Pyccel-gcc Pyccel-intel

Timing (µs) 57.71 7.98 6.46 10−2 6.28 10−2 8.02 10−2 2.81 10−2
Speedup − × 7.22 × 892.02 × 918.56 × 719.32 × 2048.65

L1.0 HPC Overview
No ratings yet
L1.0 HPC Overview
58 pages
Hpca Notes
No ratings yet
Hpca Notes
216 pages
Lec01 Introduction
No ratings yet
Lec01 Introduction
82 pages
Parallel Programming for Scientists
No ratings yet
Parallel Programming for Scientists
50 pages
Chapter 1 Measuring Understanding Performance
No ratings yet
Chapter 1 Measuring Understanding Performance
63 pages
Computer Architecture: Fundamentals
No ratings yet
Computer Architecture: Fundamentals
36 pages
Speedup 0912
No ratings yet
Speedup 0912
34 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Cours 1
No ratings yet
Cours 1
38 pages
20.optimization I
No ratings yet
20.optimization I
55 pages
Computer Architecture: Fundamentals Prof. Jerry Breecher CSCI 240 Fall 2003
No ratings yet
Computer Architecture: Fundamentals Prof. Jerry Breecher CSCI 240 Fall 2003
36 pages
Cours 1
No ratings yet
Cours 1
38 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
EE5902R Chapter 1 Slides
No ratings yet
EE5902R Chapter 1 Slides
46 pages
Lecture 9
No ratings yet
Lecture 9
72 pages
HPC Part B
No ratings yet
HPC Part B
5 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
57 pages
ACA UNit 1
No ratings yet
ACA UNit 1
29 pages
Chapter Two
No ratings yet
Chapter Two
33 pages
Lec 01
No ratings yet
Lec 01
67 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
1.1 The Characteristics of Contemporary Processors.280155520
No ratings yet
1.1 The Characteristics of Contemporary Processors.280155520
1 page
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Introduction & Matrix Multiplication: 6.172 Performance Engineering of Software Systems
No ratings yet
Introduction & Matrix Multiplication: 6.172 Performance Engineering of Software Systems
69 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
Chapter 1 Edit PDF
No ratings yet
Chapter 1 Edit PDF
40 pages
Chapter 1 Edit
No ratings yet
Chapter 1 Edit
463 pages
Datascience Unit3
No ratings yet
Datascience Unit3
19 pages
Week2 - 1
No ratings yet
Week2 - 1
64 pages
Lectures
No ratings yet
Lectures
45 pages
Chapter - 01 - Computer Abstractions
No ratings yet
Chapter - 01 - Computer Abstractions
37 pages
Fast Python High Performance Techniques For Large Datasets MEAP V10 Tiago Rodrigues Antao Instant Download
No ratings yet
Fast Python High Performance Techniques For Large Datasets MEAP V10 Tiago Rodrigues Antao Instant Download
110 pages
Lecture 3
No ratings yet
Lecture 3
26 pages
Multicore02 2
No ratings yet
Multicore02 2
18 pages
HPC Unit 2
No ratings yet
HPC Unit 2
72 pages
CS3350B Computer Architecture: Marc Moreno Maza
100% (1)
CS3350B Computer Architecture: Marc Moreno Maza
45 pages
GPU Unit-1
No ratings yet
GPU Unit-1
10 pages
Advanced Computer Architecture Course Overview
No ratings yet
Advanced Computer Architecture Course Overview
56 pages
Full Fast Python High Performance Techniques For Large Datasets MEAP V10 Tiago Rodrigues Antao Ebook All Chapters
No ratings yet
Full Fast Python High Performance Techniques For Large Datasets MEAP V10 Tiago Rodrigues Antao Ebook All Chapters
77 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
PPT#01
No ratings yet
PPT#01
30 pages
Trends in Computer Architecture
No ratings yet
Trends in Computer Architecture
30 pages
Feature Engineering - Introduction
No ratings yet
Feature Engineering - Introduction
74 pages
High Performance Computing 5.2
No ratings yet
High Performance Computing 5.2
294 pages
OpenMPBoothTalk PyOMP
No ratings yet
OpenMPBoothTalk PyOMP
25 pages
Ddca 2024 Lecture24 Memory Hierarchy and Caches Beforelecture
No ratings yet
Ddca 2024 Lecture24 Memory Hierarchy and Caches Beforelecture
304 pages
Handout Chapter-1 PBK
No ratings yet
Handout Chapter-1 PBK
14 pages
03 Why Parallel
No ratings yet
03 Why Parallel
34 pages
SENG419-python 98745
No ratings yet
SENG419-python 98745
103 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
Optimising Serial Code
No ratings yet
Optimising Serial Code
101 pages
Chapter 2
No ratings yet
Chapter 2
15 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
Computer Architecture: Challenges and Opportunities For The Next Decade
No ratings yet
Computer Architecture: Challenges and Opportunities For The Next Decade
13 pages
Perfbook 2023 06 11a
No ratings yet
Perfbook 2023 06 11a
662 pages
CHAPTER 1 and 2
No ratings yet
CHAPTER 1 and 2
25 pages
Big Data Concepts Detailed Introductory Notes
No ratings yet
Big Data Concepts Detailed Introductory Notes
8 pages
Perfbook-1c 2023 06 11a
No ratings yet
Perfbook-1c 2023 06 11a
970 pages