You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When will we have a recent version of llama-cpp-python, functional with CUDA, via pip?
It's a real nightmare to make it work any other way.
0.3.4 works with CUDA, but it doesn't take into account models like Qwen 3 quantified.
with kind regards
The text was updated successfully, but these errors were encountered:
You can try to compile the new code I maintain here: https://github.com/JamePeng/llama-cpp-python, but I only pre-compiled the Windows and linux version based on the recent code
When will we have a recent version of llama-cpp-python, functional with CUDA, via pip?
I probably misunderstand your question, but I am using 0.3.9 with CUDA via pip. I can use Qwen3 related models as well (using Iquants for example or flash attention, is that what you are referring to?). Sorry if my comment is useless.
I built using... CMAKE_ARGS="-DGGML_CUDA=on -DLLAVA_BUILD=off -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade
When will we have a recent version of llama-cpp-python, functional with CUDA, via pip?
It's a real nightmare to make it work any other way.
0.3.4 works with CUDA, but it doesn't take into account models like Qwen 3 quantified.
with kind regards
The text was updated successfully, but these errors were encountered: