Master CI Programarea GPU
Laborator 1
Programarea GPU – Introducere in CUDA
Hello CUDA World!
1. Rulati aplicatia DeviceQuery utilizand NVIDIA GPU Computing SDK Browser si
identificati proprietatile device-urilor CUDA instalate pe statiile din laborator:
CUDA Device
# of Multiprocessors
# of Cores per MP
Total # of cores
Global Memory (MB)
Warp size
# of Threads per block
minimum # of threads processed in SIMD
fashion by a CUDA multiprocessor
Dimensiunile maxime ale unui grid
Dimensiunile maxime ale unui bloc
2. Creati un proiect CUDA in Visual Studio.
a. Urmariti structura programului demo si identificati: portiunea de cod ce se
executa pe GPU, nr. de thread-uri GPU ce executa codul paralel.
b. Modificati aplicatia demo astfel incat sa variati nr. de elemente din vectorii
ce se aduna, iar fiecare element din vectorul rezultat sa fie calculate pe un
thread separate pe GPU. Incercati diferite valori pt. nr de elemente: 1000,
100000, 1000000, 10000000,…. (asigurati-va ca ati furnizat o configuratie
de executie fezabila!)
Urmariti tutorialele CUDA accesibile la:
https://developer.nvidia.com/how-to-cuda-c-cpp
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
1
Master CI Programarea GPU
Laborator 1
Analiza performantelor unei aplicatii CUDA
1. Masurarea timpului de executie
Varianta 1 – Utilizarea unui Timer pe CPU
cudaMemcpy(…);
t1 = myCPUTimer();
myKernel<<<……>>(…);
cudaDeviceSynchronize();
t2 = myCPUTimer();
cudaMemcpy(…);
Nota: Apelul kernelului CUDA este asincron!! Controlul revine pe CPU imediat dupa apel
(foarte posibil inainte de terminarea executiei kernelului pe GPU). Astfel, este obligatorie
sincronizarea CPU-GPU!
Varianta 2 – Utilizarea Event API
CUDA Event API Management Functions:
cudaEventCreate
cudaEventCreateWithFlags
cudaEventDestroy
cudaEventElapsedTime
cudaEventQuery
cudaEventRecord
cudaEventSynchronize
cudaEvent_t start,stop;
// Generate events
cudaEventCreate(&start);
cudaEventCreate(&stop);
// Trigger event 'start'
cudaEventRecord(start, 0);
/* CUDA Host / Device / Kernel Code ... */
cudaEventRecord(stop, 0); // Trigger Stop event
cudaEventSynchronize(stop); // Sync events (BLOCKS till last
(stop in this case) has been recorded!)
2
Master CI Programarea GPU
Laborator 1
float elapsedTime; // Initialize elapsedTime;
cudaEventElapsedTime(&elapsedTime, start, stop); // Calculate
runtime, write to elapsedTime -- cudaEventElapsedTime returns
value in milliseconds. Resolution ~0.5ms
printf("Execution Time: %f", elapsedTime); // Print Elapsed
time
// Destroy CUDA Event API Events
cudaEventDestroy(start);
cudaEventDestroy(stop);
2. CUDA Visual Profiler
https://developer.nvidia.com/nvidia-visual-profiler
CUDA occupancy calculator:
http://developer.download.nvidia.com/compute/cuda/CUDA_Occupancy_calculator.xls