Tags: noxer/ollama
Tags
Use flash attention flag for now (ollama#4580) * put flash attention behind flag for now * add test * remove print * up timeout for sheduler tests
Merge pull request ollama#4543 from ollama/mxyng/simple-safetensors simplify safetensors reading
fix the cpu estimatedTotal memory + get the expiry time for loading m… …odels (ollama#4461)
Merge pull request ollama#4323 from dhiltgen/sort_by_free Always use the sorted list of GPUs
Merge pull request ollama#4231 from ollama/mxyng/parser types/model: fix parser for empty values
Merge pull request ollama#4188 from dhiltgen/use_our_lib User our bundled libraries (cuda) instead of the host library
PreviousNext