Thanks to visit codestin.com
Credit goes to github.com

Skip to content

raishish/unikernels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Unikernels

Compare GPU kernels across CUDA, Metal, and CPU — one API, real numbers.

Unikernels is a lightweight cross-backend benchmarking toolkit for GPU developers, compiler engineers, and AI researchers. It lets you write a kernel once, run it on multiple backends, and measure how they really perform — with consistent APIs, timing, and disassembly.

Warning

This project is under early development preview. APIs and features may change without notice.


⚡️ Motivation

GPU compute is fragmented. CUDA, Metal, HIP, SYCL, oneAPI, Vulkan… every vendor has its own dialect.

Unikernels doesn’t try to replace them — it exposes them. You can write, benchmark, and compare kernels across devices with minimal friction.

Think:

  • tinygrad’s simplicity × Kokkos’ backend reach × Triton’s introspection tools.

🧩 Features (v0.1 Roadmap)

Feature Status
✅ Unified C++ API for kernels (CUDA, Metal, CPU) done
🧪 CLI benchmark runner in progress
📈 Cross-backend perf visualization planned
🧠 Python and Rust bindings planned
🔬 GEMM, conv2d, reduction, attention microbenchmarks planned
🔍 Kernel disassembly viewer (PTX / Metal IR) planned
🧰 Reproducibility metadata (compiler, driver, device) planned

🚀 Quick Start

1️⃣ Build

git clone https://github.com/raishish/unikernels
cd unikernels
cmake -B build
cmake --build build -j

2️⃣ Run a Benchmark (coming soon)

./build/unikernels bench matmul --size 1024 --backend metal
./build/unikernels bench matmul --size 1024 --backend cuda

3️⃣ Compare (coming soon)

python3 scripts/plot_benchmarks.py results.json

📊 Example Output (coming soon)

Kernel Backend Size Time (ms) TFLOPS
matmul CUDA (RTX 4090) 1024 0.42 5.1
matmul Metal (M3 Max) 1024 0.75 2.8
matmul CPU (i9) 1024 12.5 0.2

🛠️ Backends

Backend Supported Notes
Metal Metal 4 kernels (support for Metal 3.x coming soon)
CUDA
CPU 🔜

📚 Architecture

src/
 ├─ backends/
 │   ├─ cuda/
 │   ├─ metal/
 │   ├─ cpu/
 ├─ core/
 │   ├─ context.cpp
 │   ├─ tensor.cpp
 ├─ benchmarks/
 │   ├─ matmul.cpp
 │   ├─ conv2d.cpp
 └─ cli/
     ├─ main.cpp

Each backend implements a small, consistent interface for launching kernels and collecting timings. The CLI and Python bindings wrap these interfaces for easy experimentation.


📅 Roadmap

v0.1 — MVP (Now)

  • Metal, CUDA, CPU backends
  • Vector add + matmul examples
  • CLI benchmarking tool
  • JSON/CSV output

v0.2 — Bench Suite

  • conv2d, reduction, attention kernels
  • perf charts + Python bindings
  • reproducibility metadata

v0.3 — Insights & Ecosystem

  • Disassembly viewer
  • Auto-report generator for perf comparisons

🤝 Contributing

Pull requests are welcome — especially new kernels or backends. See CONTRIBUTING.md for setup and testing guidelines.


📜 License

MIT — do whatever you want, just credit the project.


💡 Why It Exists

Because “write once, run anywhere” has always been a myth — and it’s time someone measured how mythical it actually is.


Citation

If you use UniKernels in your research, education, or production systems, please cite:

@software{unikernels2025,
   title={UniKernels: A Cross-Platform C++ GPU Computing Library for Deep Learning and HPC},
   author={Rai, Ashish},
   url={https://github.com/raishish/unikernels},
   version={1.0},
   year={2025},
   note={C++ library with Python and Rust bindings for CUDA, ROCm, and Metal GPU programming}
}

About

A Cross-Platform C++ GPU Computing Library for parallel algorithms (gpus go brrrr)

Topics

Resources

License

Stars

Watchers

Forks