Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Intel Mac (i9 - 5500M) (macOS 15.3.2) - ValueError: Failed to create llama_context - llama_init_from_model: failed to initialize Metal backend #1988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 tasks done
starkAhmed43 opened this issue Mar 29, 2025 · 1 comment

Comments

@starkAhmed43
Copy link

starkAhmed43 commented Mar 29, 2025

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I am trying to run a local LLM:

`from llama_cpp import Llama

llm = Llama(model_path="/Users/mfzainulabideen/Downloads/Llama/Llama-3.2-3B-Instruct/Llama-3.2-3B-Instruct-F16.gguf")`

It should execute the code and complete the cell in ipynb so that I can figure out what I plan to do next with the LLM.

Current Behavior

The code doesn't run and throws an error.

Environment and Context

I am trying to execute this on a 2019 16" MBP with a 2.4GHz i9, 32GB RAM and a 1TB SSD running on macOS 15.3.2.

I am using the latest miniconda env with Python 3.11.11.
GNU Make 3.81 built for i386-apple-darwin11.3.0.

Apple clang version 16.0.0 (clang-1600.0.26.6)
Target: x86_64-apple-darwin24.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Failure Information (for bugs)

Python throws the following error:

ValueError                                Traceback (most recent call last)
Cell In[2], line 1
----> 1 llm = Llama(model_path="/Users/mfzainulabideen/Downloads/Llama/Llama-3.2-3B-Instruct/Llama-3.2-3B-Instruct-F16.gguf")

File /opt/miniconda3/envs/test/lib/python3.11/site-packages/llama_cpp/llama.py:393, in Llama.__init__(self, model_path, n_gpu_layers, split_mode, main_gpu, tensor_split, rpc_servers, vocab_only, use_mmap, use_mlock, kv_overrides, seed, n_ctx, n_batch, n_ubatch, n_threads, n_threads_batch, rope_scaling_type, pooling_type, rope_freq_base, rope_freq_scale, yarn_ext_factor, yarn_attn_factor, yarn_beta_fast, yarn_beta_slow, yarn_orig_ctx, logits_all, embedding, offload_kqv, flash_attn, no_perf, last_n_tokens_size, lora_base, lora_scale, lora_path, numa, chat_format, chat_handler, draft_model, tokenizer, type_k, type_v, spm_infill, verbose, **kwargs)
    388     self.context_params.n_batch = self.n_batch
    389     self.context_params.n_ubatch = min(self.n_batch, n_ubatch)
    391 self._ctx = self._stack.enter_context(
    392     contextlib.closing(
--> 393         internals.LlamaContext(
    394             model=self._model,
    395             params=self.context_params,
    396             verbose=self.verbose,
    397         )
    398     )
    399 )
    401 self._batch = self._stack.enter_context(
    402     contextlib.closing(
    403         internals.LlamaBatch(
   (...)    409     )
    410 )
    412 self._lora_adapter: Optional[llama_cpp.llama_adapter_lora_p] = None

File /opt/miniconda3/envs/test/lib/python3.11/site-packages/llama_cpp/_internals.py:255, in LlamaContext.__init__(self, model, params, verbose)
    252 ctx = llama_cpp.llama_new_context_with_model(self.model.model, self.params)
    254 if ctx is None:
--> 255     raise ValueError("Failed to create llama_context")
    257 self.ctx = ctx
    259 def free_ctx():

ValueError: Failed to create llama_context

Steps to Reproduce

conda create -n llm python=3.11 -y && conda activate llm
pip install jupyter
pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir --verbose

then in a notebook, with the llm env as the kernel, run the following code:

from llama_cpp import Llama
llm = Llama(model_path="/Users/mfzainulabideen/Downloads/Llama/Llama-3.2-3B-Instruct/Llama-3.2-3B-Instruct-F16.gguf")

Obviously change the model_path.

Failure Logs

PFA the verbose log upto the point of failure:
llama-cpp-python-verbose-log.txt

My environment info:

llama-cpp-python$ git log | head -1
commit 37eb5f0a4c2a8706b89ead1406b1577c4602cdec

llama-cpp-python$ python3 --version
Python 3.11.11

llama-cpp-python$ pip list | egrep "uvicorn|fastapi|sse-starlette|numpy"
numpy                     2.2.4

llama-cpp-python/vendor/llama.cpp$ git log | head -3
commit 37eb5f0a4c2a8706b89ead1406b1577c4602cdec
Author: Andrei Betlen <[email protected]>
Date:   Wed Mar 12 05:30:21 2025 -0400
@starkAhmed43
Copy link
Author

I have separately cloned llama-cpp from the GitHub repo and have built it according to the following post Ollama issues forum. That successfully runs any LLM on my GPU.

However when I try to build llama-cpp-python, it recognizes the Vulkan backend but still throws the same metal error and the same ValueError: Failed to create llama_context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant