A comprehensive collection of CUDA programming materials and practical exercises for learning GPU computing, based on the Oxford University CUDA course. This repository contains 12 progressive practicals covering fundamental to advanced GPU programming concepts.
This course provides hands-on experience with CUDA programming, from basic kernel execution to advanced optimization techniques. The materials are designed for students and professionals looking to master GPU computing for high-performance applications.
- Master CUDA programming fundamentals and GPU architecture
- Implement efficient parallel algorithms on GPU
- Understand memory optimization and performance tuning
- Learn advanced concurrency and streaming techniques
- Develop production-ready GPU applications
GPU/
├── README.md # This file
├── CUDA_Course_Oxford.md # Original course documentation
├── lectures_structure.txt # Course structure overview
└── practicals/ # Hands-on programming exercises
├── README.md # Detailed practicals guide
├── headers/ # CUDA helper utilities
├── prac1/ # Hello World - CUDA Basics
├── prac2/ # Device Properties & Memory
├── prac3/ # 3D Laplace Equation Solver
├── prac4/ # Reduction Algorithms
├── prac5/ # Tensor Core Operations
├── prac6/ # Advanced Memory Patterns
├── prac7/ # Tridiagonal Solver
├── prac8/ # Scan Algorithms
├── prac9/ # Pattern Matching
├── prac10/ # Autotuning System
├── prac11/ # Multithreading & Streams
└── prac12/ # Kernel Overlap & Work Streaming
- CUDA Toolkit: Version 11.0 or later
- GPU: Compute capability 7.0+ (RTX 20xx series or newer)
- Compiler: GCC (Linux) or MSVC (Windows)
- Build System: Make
- Optional: OpenMP for multithreading exercises
-
Clone the repository:
git clone <repository-url> cd GPU
-
Verify CUDA installation:
nvcc --version nvidia-smi
-
Test with first practical:
cd practicals/prac1 make ./prac1a
Perfect for CUDA newcomers
- Practical 1: Hello World - Learn kernel launching, memory transfers, and error checking
- Practical 2: Device Properties - Understand GPU architecture and memory management
- Practical 3: 3D Laplace Solver - Implement numerical methods on 3D grids
Building algorithmic expertise
- Practical 4: Reduction Algorithms - Master parallel reduction and shared memory
- Practical 5: Tensor Cores - Leverage mixed-precision for high-performance GEMM
- Practical 6: Memory Optimization - Advanced memory patterns and unified memory
Specialized algorithms and applications
- Practical 7: Tridiagonal Solver - Implement linear system solvers
- Practical 8: Scan Algorithms - Work-efficient prefix sum operations
- Practical 9: Pattern Matching - Parallel string processing algorithms
Production-ready optimization techniques
- Practical 10: Autotuning System - Automated performance optimization framework
- Practical 11: Multithreading & Streams - Concurrency models and asynchronous execution
- Practical 12: Kernel Overlap - Advanced streaming and pipeline optimization
| Technology | Practicals | Description |
|---|---|---|
| CUDA Runtime API | 1-12 | Core CUDA programming interface |
| Memory Management | 1-3, 6 | Host-device transfers, unified memory |
| Shared Memory | 4, 7-8 | On-chip memory optimization |
| Tensor Cores | 5 | Mixed-precision matrix operations |
| CUDA Streams | 11-12 | Asynchronous execution and overlap |
| OpenMP | 11 | CPU multithreading integration |
| cuBLAS | 5 | Optimized linear algebra library |
| Autotuning | 10 | Performance optimization framework |
Each practical emphasizes performance analysis:
- Timing measurements and profiling techniques
- CPU vs GPU performance comparisons
- Memory bandwidth utilization analysis
- Scalability testing across problem sizes
- Optimization strategies and bottleneck identification
- Practicals Guide: Comprehensive overview of all exercises
- Headers Documentation: CUDA utility functions reference
- Course Materials: Original Oxford course documentation
- Individual README files: Detailed guides for each practical
Each practical includes:
- Makefile: Optimized compilation rules
- NVCC flags: Architecture-specific optimizations
- Dependencies: Library linking (cuBLAS, OpenMP)
- Error checking: Comprehensive debugging support
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request with detailed description
This educational material is provided for learning purposes. Original course content is attributed to Oxford University. See individual files for specific licensing terms.
- Oxford University for the original CUDA course materials
- NVIDIA for CUDA toolkit and documentation
- University of Leeds for educational support
After completing this course:
- Explore CUDA libraries (cuDNN, cuFFT, Thrust)
- Learn multi-GPU programming with NCCL
- Study advanced profiling with Nsight tools
- Implement domain-specific applications
- Contribute to open-source GPU projects
Happy GPU Programming! 🚀
Transform your computational challenges with the power of parallel processing.