1 unstable release
| 0.0.1 | Oct 10, 2024 |
|---|
#11 in #cu-blas
43KB
1K
SLoC
infa
Rust + CUDA = Fast and simple inference library from scratch
requirements
Linux computer with CUDA 12~, cublas, rust installed.
You need at least sm_80 micro architecture. (This is hardcoded for now.)
compared to pytorch and llama.cpp
WIP
roadmap
Our first goal is to support bloat16 Llama 3.2 1B inference.
Dependencies
~0.5–1.2MB
~23K SLoC