Speed-up ollama for local Apple Silicon dev

**Problem**:
When running a local dev instance on Apple silicon (M1 Max), I only get 0.15 tokens/second from ollama using llama3.1:8b quantized. This is because docker cannot access the GPU on macOS (unlike on linux).

This prevents running any tests with delphi locally, in practice.

**Suggested solution**:
Running a  standalone ollama native for macOS instead of a docker env got me to 28.64 tokens/second.
```
brew install ollama
ollama serve
ollama pull llama3.1:8b
```

I can do that manually for my own tests, and update `.env` to point to that ollama instance.
But if there are other macOS-based devs who will want to try delphi, it might be worth incorporating an architecture detection and the above commands into the Makefile.

**Additional context**:
References on docker and ollama and GPUs and Apple Silicon:
- https://ollama.com/blog/ollama-is-now-available-as-an-official-docker-image
- https://chariotsolutions.com/blog/post/apple-silicon-gpus-docker-and-ollama-pick-two/


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed-up ollama for local Apple Silicon dev #2260

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Speed-up ollama for local Apple Silicon dev #2260

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions