-
Notifications
You must be signed in to change notification settings - Fork 46
CUDA support for pre-built library #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hey @shuttie really nice! I thought about adding CUDA support before, but held back because of the complexity. It would be really valuable though and I'd be happy to assist. I think the best option for a toggle between CPU and CUDA would be to use Maven classifiers, so users would get CPU by default, but could access CUDA support by something like
This would solve two problems I think:
|
Yes, a good idea with classifiers - the resulting |
@kherud The PR is ready for review. I have local fork to validate that the CI changes are green: https://github.com/shuttie/java-llama.cpp/actions/runs/10688928505/job/29629874992: I would like to have your support with future publishing - as I have no permissions for doing it in external repo, and cannot test if it works for multi-artifact builds. But I mostly sure that it should be OK - doing
|
Amazing work, thank you! I hope I'll find the time to review and merge it tomorrow, but can't promise it. Definitely by thursday though. If you want I can also add you as a collaborator. |
I checked the code and it looks good to me. The CUDA build seems to correctly work. When I run [target]$ ls -lh
total 3.2M
drwxr-xr-x 6 konstantin wheel 4.0K Sep 5 16:32 apidocs
drwxr-xr-x 3 konstantin wheel 4.0K Sep 5 16:32 classes
drwxr-xr-x 3 konstantin wheel 4.0K Sep 5 16:32 classes_cuda
drwxr-xr-x 3 konstantin wheel 4.0K Sep 5 16:32 generated-sources
drwxr-xr-x 3 konstantin wheel 4.0K Sep 5 16:32 generated-test-sources
drwxr-xr-x 2 konstantin wheel 4.0K Sep 5 16:32 javadoc-bundle-options
-rw-r--r-- 1 konstantin wheel 34K Sep 5 16:32 llama-3.3.0-cuda12-linux-x86-64.jar
-rw-r--r-- 1 konstantin wheel 176K Sep 5 16:32 llama-3.3.0-javadoc.jar
-rw-r--r-- 1 konstantin wheel 1.5M Sep 5 16:32 llama-3.3.0-sources.jar
-rw-r--r-- 1 konstantin wheel 1.5M Sep 5 16:32 llama-3.3.0.jar
-rw-r--r-- 1 konstantin wheel 5.5K Sep 5 16:32 llama-3.3.0.pom
drwxr-xr-x 2 konstantin wheel 4.0K Sep 5 16:32 maven-archiver
-rw-r--r-- 1 konstantin wheel 2.4K Sep 5 16:32 maven-javadoc-plugin-stale-data.txt
drwxr-xr-x 3 konstantin wheel 4.0K Sep 5 16:32 maven-status
drwxr-xr-x 4 konstantin wheel 4.0K Sep 5 16:32 test-classes Note the 34K file size at |
So in the CI To run the whole thing locally for Linux, you need to run the build process twice:
|
Ah yes, reproduced your issue - there was a typo in
|
Works like a charm 👌 |
This PR adds support for a CUDA 12 linked ggml library, so you can choose between CPU and GPU inference without building your own.
The packaging done via maven classifiers: the default JAR artifact is still the same, but there's an extra one with the
cuda12-linux-x86-64
classifier, which has CUDA12 native library.For build I created a separate build step doing extra setup:
build_linux_cuda.sh
script piggybacks on top of origialbuild.sh
- I would like to avoid using custom docker image with dependencies installed to simplify future updates.