Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Speed-up ollama for local Apple Silicon dev #2260

@jucor

Description

@jucor

Problem:
When running a local dev instance on Apple silicon (M1 Max), I only get 0.15 tokens/second from ollama using llama3.1:8b quantized. This is because docker cannot access the GPU on macOS (unlike on linux).

This prevents running any tests with delphi locally, in practice.

Suggested solution:
Running a standalone ollama native for macOS instead of a docker env got me to 28.64 tokens/second.

brew install ollama
ollama serve
ollama pull llama3.1:8b

I can do that manually for my own tests, and update .env to point to that ollama instance.
But if there are other macOS-based devs who will want to try delphi, it might be worth incorporating an architecture detection and the above commands into the Makefile.

Additional context:
References on docker and ollama and GPUs and Apple Silicon:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions