- [x] faster SDPA from LuxLib https://github.com/LuxDL/Lux.jl/pull/1452 - [ ] compile the generation loop instead of single token compile - [ ] implement KV caching