Intel Mac (i9 - 5500M) (macOS 15.3.2) - ValueError: Failed to create llama_context - llama_init_from_model: failed to initialize Metal backend #1988

starkAhmed43 · 2025-03-29T10:18:43Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I am trying to run a local LLM:

`from llama_cpp import Llama

llm = Llama(model_path="/Users/mfzainulabideen/Downloads/Llama/Llama-3.2-3B-Instruct/Llama-3.2-3B-Instruct-F16.gguf")`

It should execute the code and complete the cell in ipynb so that I can figure out what I plan to do next with the LLM.

Current Behavior

The code doesn't run and throws an error.

Environment and Context

I am trying to execute this on a 2019 16" MBP with a 2.4GHz i9, 32GB RAM and a 1TB SSD running on macOS 15.3.2.

I am using the latest miniconda env with Python 3.11.11.
GNU Make 3.81 built for i386-apple-darwin11.3.0.

Apple clang version 16.0.0 (clang-1600.0.26.6)
Target: x86_64-apple-darwin24.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Failure Information (for bugs)

Python throws the following error:

ValueError                                Traceback (most recent call last)
Cell In[2], line 1
----> 1 llm = Llama(model_path="/Users/mfzainulabideen/Downloads/Llama/Llama-3.2-3B-Instruct/Llama-3.2-3B-Instruct-F16.gguf")

File /opt/miniconda3/envs/test/lib/python3.11/site-packages/llama_cpp/llama.py:393, in Llama.__init__(self, model_path, n_gpu_layers, split_mode, main_gpu, tensor_split, rpc_servers, vocab_only, use_mmap, use_mlock, kv_overrides, seed, n_ctx, n_batch, n_ubatch, n_threads, n_threads_batch, rope_scaling_type, pooling_type, rope_freq_base, rope_freq_scale, yarn_ext_factor, yarn_attn_factor, yarn_beta_fast, yarn_beta_slow, yarn_orig_ctx, logits_all, embedding, offload_kqv, flash_attn, no_perf, last_n_tokens_size, lora_base, lora_scale, lora_path, numa, chat_format, chat_handler, draft_model, tokenizer, type_k, type_v, spm_infill, verbose, **kwargs)
    388     self.context_params.n_batch = self.n_batch
    389     self.context_params.n_ubatch = min(self.n_batch, n_ubatch)
    391 self._ctx = self._stack.enter_context(
    392     contextlib.closing(
--> 393         internals.LlamaContext(
    394             model=self._model,
    395             params=self.context_params,
    396             verbose=self.verbose,
    397         )
    398     )
    399 )
    401 self._batch = self._stack.enter_context(
    402     contextlib.closing(
    403         internals.LlamaBatch(
   (...)    409     )
    410 )
    412 self._lora_adapter: Optional[llama_cpp.llama_adapter_lora_p] = None

File /opt/miniconda3/envs/test/lib/python3.11/site-packages/llama_cpp/_internals.py:255, in LlamaContext.__init__(self, model, params, verbose)
    252 ctx = llama_cpp.llama_new_context_with_model(self.model.model, self.params)
    254 if ctx is None:
--> 255     raise ValueError("Failed to create llama_context")
    257 self.ctx = ctx
    259 def free_ctx():

ValueError: Failed to create llama_context

Steps to Reproduce

conda create -n llm python=3.11 -y && conda activate llm
pip install jupyter
pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir --verbose

then in a notebook, with the llm env as the kernel, run the following code:

from llama_cpp import Llama
llm = Llama(model_path="/Users/mfzainulabideen/Downloads/Llama/Llama-3.2-3B-Instruct/Llama-3.2-3B-Instruct-F16.gguf")

Obviously change the model_path.

Failure Logs

PFA the verbose log upto the point of failure:
llama-cpp-python-verbose-log.txt

My environment info:

llama-cpp-python$ git log | head -1
commit 37eb5f0a4c2a8706b89ead1406b1577c4602cdec

llama-cpp-python$ python3 --version
Python 3.11.11

llama-cpp-python$ pip list | egrep "uvicorn|fastapi|sse-starlette|numpy"
numpy                     2.2.4

llama-cpp-python/vendor/llama.cpp$ git log | head -3
commit 37eb5f0a4c2a8706b89ead1406b1577c4602cdec
Author: Andrei Betlen <[email protected]>
Date:   Wed Mar 12 05:30:21 2025 -0400

The text was updated successfully, but these errors were encountered:

starkAhmed43 · 2025-03-29T10:22:07Z

I have separately cloned llama-cpp from the GitHub repo and have built it according to the following post Ollama issues forum. That successfully runs any LLM on my GPU.

However when I try to build llama-cpp-python, it recognizes the Vulkan backend but still throws the same metal error and the same ValueError: Failed to create llama_context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intel Mac (i9 - 5500M) (macOS 15.3.2) - ValueError: Failed to create llama_context - llama_init_from_model: failed to initialize Metal backend #1988

Intel Mac (i9 - 5500M) (macOS 15.3.2) - ValueError: Failed to create llama_context - llama_init_from_model: failed to initialize Metal backend #1988

starkAhmed43 commented Mar 29, 2025 •

edited

Loading

starkAhmed43 commented Mar 29, 2025

Intel Mac (i9 - 5500M) (macOS 15.3.2) - ValueError: Failed to create llama_context - llama_init_from_model: failed to initialize Metal backend #1988

Intel Mac (i9 - 5500M) (macOS 15.3.2) - ValueError: Failed to create llama_context - llama_init_from_model: failed to initialize Metal backend #1988

Comments

starkAhmed43 commented Mar 29, 2025 • edited Loading

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

starkAhmed43 commented Mar 29, 2025

starkAhmed43 commented Mar 29, 2025 •

edited

Loading