Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Llava clip not loading to GPU in version 0.2.58 (Downgrading to 0.2.55 works) #1324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
FYYHU opened this issue Apr 3, 2024 · 2 comments
Closed
4 tasks done
Labels
bug Something isn't working

Comments

@FYYHU
Copy link

FYYHU commented Apr 3, 2024

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I wanted to implement Llava as depicted in the readme I used the provided code and the linked GGUF files. I also installed the module using the cublast flags as mentioned in the documentation.

I expected the clip vision tower to be loaded in cuda and the llm to be loaded in cuda

Current Behavior

On Latest version 0.2.58 of llama-cpp-python. I observe that the clip model forces CPU backend, while the llm part uses CUDA. Downgrading llama-cpp-python to version 0.2.55 fixes this issue.

Environment and Context

OS: Ubuntu 22.04 - X86
CUDA: 11.8
Python: 3.8 (in miniconda)
llama-cpp-python: 0.2.58

@eisneim
Copy link

eisneim commented Apr 4, 2024

same here!
OS: Ubuntu 22.04 - X86
CUDA: 12
Python: 3.10.14
llama-cpp-python: 0.2.59

i'm using: CMAKE_ARGS="-DLLAMA_CUBLAS=on -DLLAVA_BUILD=on" pip install --upgrade --force-reinstall --no-cache-dir llama-cpp-python

but i still get:

clip_model_load: - type  f32:  235 tensors
clip_model_load: - type  f16:  142 tensors
clip_model_load: CLIP using CPU backend
clip_model_load: params backend buffer size =  615.49 MB (377 tensors)
clip_model_load: compute allocated memory: 32.89 MB

@abetlen abetlen added the bug Something isn't working label Apr 4, 2024
@abetlen
Copy link
Owner

abetlen commented May 10, 2024

@FYYHU @eisneim thanks for reporting this, the flag to enable cuda support changed from GGML_USE_CUBLAS to the more appropriate GGML_USE_CUDA however the CMakelists.txt in this project was still using the old value so it wasn't including it. It's fixed now and should be in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants