GPU Computing with CUDA

A comprehensive collection of CUDA programming materials and practical exercises for learning GPU computing, based on the Oxford University CUDA course. This repository contains 12 progressive practicals covering fundamental to advanced GPU programming concepts.

🎯 Course Overview

This course provides hands-on experience with CUDA programming, from basic kernel execution to advanced optimization techniques. The materials are designed for students and professionals looking to master GPU computing for high-performance applications.

Learning Objectives

Master CUDA programming fundamentals and GPU architecture
Implement efficient parallel algorithms on GPU
Understand memory optimization and performance tuning
Learn advanced concurrency and streaming techniques
Develop production-ready GPU applications

📚 Repository Structure

GPU/
├── README.md                    # This file
├── CUDA_Course_Oxford.md        # Original course documentation
├── lectures_structure.txt       # Course structure overview
└── practicals/                  # Hands-on programming exercises
    ├── README.md               # Detailed practicals guide
    ├── headers/                # CUDA helper utilities
    ├── prac1/                  # Hello World - CUDA Basics
    ├── prac2/                  # Device Properties & Memory
    ├── prac3/                  # 3D Laplace Equation Solver
    ├── prac4/                  # Reduction Algorithms
    ├── prac5/                  # Tensor Core Operations
    ├── prac6/                  # Advanced Memory Patterns
    ├── prac7/                  # Tridiagonal Solver
    ├── prac8/                  # Scan Algorithms
    ├── prac9/                  # Pattern Matching
    ├── prac10/                 # Autotuning System
    ├── prac11/                 # Multithreading & Streams
    └── prac12/                 # Kernel Overlap & Work Streaming

🚀 Quick Start

Prerequisites

CUDA Toolkit: Version 11.0 or later
GPU: Compute capability 7.0+ (RTX 20xx series or newer)
Compiler: GCC (Linux) or MSVC (Windows)
Build System: Make
Optional: OpenMP for multithreading exercises

Installation

Clone the repository:
```
git clone <repository-url>
cd GPU
```
Verify CUDA installation:
```
nvcc --version
nvidia-smi
```
Test with first practical:
```
cd practicals/prac1
make
./prac1a
```

📖 Learning Path

🟢 Beginner Level (Practicals 1-3)

Perfect for CUDA newcomers

Practical 1: Hello World - Learn kernel launching, memory transfers, and error checking
Practical 2: Device Properties - Understand GPU architecture and memory management
Practical 3: 3D Laplace Solver - Implement numerical methods on 3D grids

🟡 Intermediate Level (Practicals 4-6)

Building algorithmic expertise

Practical 4: Reduction Algorithms - Master parallel reduction and shared memory
Practical 5: Tensor Cores - Leverage mixed-precision for high-performance GEMM
Practical 6: Memory Optimization - Advanced memory patterns and unified memory

🟠 Advanced Level (Practicals 7-9)

Specialized algorithms and applications

Practical 7: Tridiagonal Solver - Implement linear system solvers
Practical 8: Scan Algorithms - Work-efficient prefix sum operations
Practical 9: Pattern Matching - Parallel string processing algorithms

🔴 Expert Level (Practicals 10-12)

Production-ready optimization techniques

Practical 10: Autotuning System - Automated performance optimization framework
Practical 11: Multithreading & Streams - Concurrency models and asynchronous execution
Practical 12: Kernel Overlap - Advanced streaming and pipeline optimization

🛠️ Key Technologies Covered

Technology	Practicals	Description
CUDA Runtime API	1-12	Core CUDA programming interface
Memory Management	1-3, 6	Host-device transfers, unified memory
Shared Memory	4, 7-8	On-chip memory optimization
Tensor Cores	5	Mixed-precision matrix operations
CUDA Streams	11-12	Asynchronous execution and overlap
OpenMP	11	CPU multithreading integration
cuBLAS	5	Optimized linear algebra library
Autotuning	10	Performance optimization framework

📊 Performance Focus

Each practical emphasizes performance analysis:

Timing measurements and profiling techniques
CPU vs GPU performance comparisons
Memory bandwidth utilization analysis
Scalability testing across problem sizes
Optimization strategies and bottleneck identification

📁 Detailed Documentation

Practicals Guide: Comprehensive overview of all exercises
Headers Documentation: CUDA utility functions reference
Course Materials: Original Oxford course documentation
Individual README files: Detailed guides for each practical

🔧 Build System

Each practical includes:

Makefile: Optimized compilation rules
NVCC flags: Architecture-specific optimizations
Dependencies: Library linking (cuBLAS, OpenMP)
Error checking: Comprehensive debugging support

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request with detailed description

📄 License

This educational material is provided for learning purposes. Original course content is attributed to Oxford University. See individual files for specific licensing terms.

🙏 Acknowledgments

Oxford University for the original CUDA course materials
NVIDIA for CUDA toolkit and documentation
University of Leeds for educational support

🚀 Next Steps

After completing this course:

Explore CUDA libraries (cuDNN, cuFFT, Thrust)
Learn multi-GPU programming with NCCL
Study advanced profiling with Nsight tools
Implement domain-specific applications
Contribute to open-source GPU projects

Happy GPU Programming! 🚀

Transform your computational challenges with the power of parallel processing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

GPU Computing with CUDA

🎯 Course Overview

Learning Objectives

📚 Repository Structure

🚀 Quick Start

Prerequisites

Installation

📖 Learning Path

🟢 Beginner Level (Practicals 1-3)

🟡 Intermediate Level (Practicals 4-6)

🟠 Advanced Level (Practicals 7-9)

🔴 Expert Level (Practicals 10-12)

🛠️ Key Technologies Covered

📊 Performance Focus

📁 Detailed Documentation

🔧 Build System

🤝 Contributing

📄 License

🙏 Acknowledgments

🚀 Next Steps

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
practicals		practicals
CUDA_Course_Oxford.md		CUDA_Course_Oxford.md
README.md		README.md
lectures_structure.txt		lectures_structure.txt

Uh oh!

Uh oh!

syaffa/CUDA-Training

Folders and files

Latest commit

History

Repository files navigation

GPU Computing with CUDA

🎯 Course Overview

Learning Objectives

📚 Repository Structure

🚀 Quick Start

Prerequisites

Installation

📖 Learning Path

🟢 Beginner Level (Practicals 1-3)

🟡 Intermediate Level (Practicals 4-6)

🟠 Advanced Level (Practicals 7-9)

🔴 Expert Level (Practicals 10-12)

🛠️ Key Technologies Covered

📊 Performance Focus

📁 Detailed Documentation

🔧 Build System

🤝 Contributing

📄 License

🙏 Acknowledgments

🚀 Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages