-
Notifications
You must be signed in to change notification settings - Fork 236
Open
Labels
feature-requestFor new feature suggestionsFor new feature suggestions
Description
Problem:
When running a local dev instance on Apple silicon (M1 Max), I only get 0.15 tokens/second from ollama using llama3.1:8b quantized. This is because docker cannot access the GPU on macOS (unlike on linux).
This prevents running any tests with delphi locally, in practice.
Suggested solution:
Running a standalone ollama native for macOS instead of a docker env got me to 28.64 tokens/second.
brew install ollama
ollama serve
ollama pull llama3.1:8b
I can do that manually for my own tests, and update .env to point to that ollama instance.
But if there are other macOS-based devs who will want to try delphi, it might be worth incorporating an architecture detection and the above commands into the Makefile.
Additional context:
References on docker and ollama and GPUs and Apple Silicon:
ballPointPenguin
Metadata
Metadata
Assignees
Labels
feature-requestFor new feature suggestionsFor new feature suggestions