Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Lectures and practicals

syaffa/CUDA-Training

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GPU Computing with CUDA

CUDA License Platform

A comprehensive collection of CUDA programming materials and practical exercises for learning GPU computing, based on the Oxford University CUDA course. This repository contains 12 progressive practicals covering fundamental to advanced GPU programming concepts.

🎯 Course Overview

This course provides hands-on experience with CUDA programming, from basic kernel execution to advanced optimization techniques. The materials are designed for students and professionals looking to master GPU computing for high-performance applications.

Learning Objectives

  • Master CUDA programming fundamentals and GPU architecture
  • Implement efficient parallel algorithms on GPU
  • Understand memory optimization and performance tuning
  • Learn advanced concurrency and streaming techniques
  • Develop production-ready GPU applications

📚 Repository Structure

GPU/
├── README.md                    # This file
├── CUDA_Course_Oxford.md        # Original course documentation
├── lectures_structure.txt       # Course structure overview
└── practicals/                  # Hands-on programming exercises
    ├── README.md               # Detailed practicals guide
    ├── headers/                # CUDA helper utilities
    ├── prac1/                  # Hello World - CUDA Basics
    ├── prac2/                  # Device Properties & Memory
    ├── prac3/                  # 3D Laplace Equation Solver
    ├── prac4/                  # Reduction Algorithms
    ├── prac5/                  # Tensor Core Operations
    ├── prac6/                  # Advanced Memory Patterns
    ├── prac7/                  # Tridiagonal Solver
    ├── prac8/                  # Scan Algorithms
    ├── prac9/                  # Pattern Matching
    ├── prac10/                 # Autotuning System
    ├── prac11/                 # Multithreading & Streams
    └── prac12/                 # Kernel Overlap & Work Streaming

🚀 Quick Start

Prerequisites

  • CUDA Toolkit: Version 11.0 or later
  • GPU: Compute capability 7.0+ (RTX 20xx series or newer)
  • Compiler: GCC (Linux) or MSVC (Windows)
  • Build System: Make
  • Optional: OpenMP for multithreading exercises

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd GPU
  2. Verify CUDA installation:

    nvcc --version
    nvidia-smi
  3. Test with first practical:

    cd practicals/prac1
    make
    ./prac1a

📖 Learning Path

🟢 Beginner Level (Practicals 1-3)

Perfect for CUDA newcomers

  • Practical 1: Hello World - Learn kernel launching, memory transfers, and error checking
  • Practical 2: Device Properties - Understand GPU architecture and memory management
  • Practical 3: 3D Laplace Solver - Implement numerical methods on 3D grids

🟡 Intermediate Level (Practicals 4-6)

Building algorithmic expertise

  • Practical 4: Reduction Algorithms - Master parallel reduction and shared memory
  • Practical 5: Tensor Cores - Leverage mixed-precision for high-performance GEMM
  • Practical 6: Memory Optimization - Advanced memory patterns and unified memory

🟠 Advanced Level (Practicals 7-9)

Specialized algorithms and applications

  • Practical 7: Tridiagonal Solver - Implement linear system solvers
  • Practical 8: Scan Algorithms - Work-efficient prefix sum operations
  • Practical 9: Pattern Matching - Parallel string processing algorithms

🔴 Expert Level (Practicals 10-12)

Production-ready optimization techniques

  • Practical 10: Autotuning System - Automated performance optimization framework
  • Practical 11: Multithreading & Streams - Concurrency models and asynchronous execution
  • Practical 12: Kernel Overlap - Advanced streaming and pipeline optimization

🛠️ Key Technologies Covered

Technology Practicals Description
CUDA Runtime API 1-12 Core CUDA programming interface
Memory Management 1-3, 6 Host-device transfers, unified memory
Shared Memory 4, 7-8 On-chip memory optimization
Tensor Cores 5 Mixed-precision matrix operations
CUDA Streams 11-12 Asynchronous execution and overlap
OpenMP 11 CPU multithreading integration
cuBLAS 5 Optimized linear algebra library
Autotuning 10 Performance optimization framework

📊 Performance Focus

Each practical emphasizes performance analysis:

  • Timing measurements and profiling techniques
  • CPU vs GPU performance comparisons
  • Memory bandwidth utilization analysis
  • Scalability testing across problem sizes
  • Optimization strategies and bottleneck identification

📁 Detailed Documentation

🔧 Build System

Each practical includes:

  • Makefile: Optimized compilation rules
  • NVCC flags: Architecture-specific optimizations
  • Dependencies: Library linking (cuBLAS, OpenMP)
  • Error checking: Comprehensive debugging support

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Submit a pull request with detailed description

📄 License

This educational material is provided for learning purposes. Original course content is attributed to Oxford University. See individual files for specific licensing terms.

🙏 Acknowledgments

  • Oxford University for the original CUDA course materials
  • NVIDIA for CUDA toolkit and documentation
  • University of Leeds for educational support

🚀 Next Steps

After completing this course:

  • Explore CUDA libraries (cuDNN, cuFFT, Thrust)
  • Learn multi-GPU programming with NCCL
  • Study advanced profiling with Nsight tools
  • Implement domain-specific applications
  • Contribute to open-source GPU projects

Happy GPU Programming! 🚀

Transform your computational challenges with the power of parallel processing.

About

Lectures and practicals

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 36.3%
  • TeX 20.9%
  • Cuda 16.9%
  • Gnuplot 11.3%
  • C++ 10.6%
  • C 2.0%
  • Other 2.0%