Thanks to visit codestin.com
Credit goes to github.com

Skip to content

CUDA support for pre-built library #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Sep 6, 2024
Merged

Conversation

shuttie
Copy link
Contributor

@shuttie shuttie commented Sep 3, 2024

This PR adds support for a CUDA 12 linked ggml library, so you can choose between CPU and GPU inference without building your own.

The packaging done via maven classifiers: the default JAR artifact is still the same, but there's an extra one with the cuda12-linux-x86-64 classifier, which has CUDA12 native library.

For build I created a separate build step doing extra setup:

  • manylinux2014 is too old for cuda12, so I had to upgrade to manylinux_2_28 - which is centos8 based.
  • the build_linux_cuda.sh script piggybacks on top of origial build.sh - I would like to avoid using custom docker image with dependencies installed to simplify future updates.
  • the maven setup is a bit hacky: I had to trigger compilation twice for regular build, and cuda-based. I found no way to easily have a JAR with classifier having custom resource directory.

@shuttie shuttie marked this pull request as draft September 3, 2024 11:09
@shuttie shuttie changed the title CI: dockerized cuda12 build CUDA support for pre-built library Sep 3, 2024
@kherud
Copy link
Owner

kherud commented Sep 3, 2024

Hey @shuttie really nice! I thought about adding CUDA support before, but held back because of the complexity. It would be really valuable though and I'd be happy to assist. I think the best option for a toggle between CPU and CUDA would be to use Maven classifiers, so users would get CPU by default, but could access CUDA support by something like

<dependency>
    <groupId>de.kherud</groupId>
    <artifactId>llama</artifactId>
    <version>3.3.0</version>
    <classifier>linux-x64-cuda</classifier>
</dependency>

This would solve two problems I think:

  • The CUDA shared libraries can become quite large and users who don't use CUDA wouldn't have to download them
  • No changes of ModelLoader would be necessary, the CUDA libraries would simply take the place of the original CPU libraries within resources/Linux/x86_64

@shuttie
Copy link
Contributor Author

shuttie commented Sep 3, 2024

Yes, a good idea with classifiers - the resulting libggml.so is already 300Mb in size :)

@shuttie
Copy link
Contributor Author

shuttie commented Sep 3, 2024

@kherud The PR is ready for review. I have local fork to validate that the CI changes are green: https://github.com/shuttie/java-llama.cpp/actions/runs/10688928505/job/29629874992:

I would like to have your support with future publishing - as I have no permissions for doing it in external repo, and cannot test if it works for multi-artifact builds. But I mostly sure that it should be OK - doing mvn clean install -DskipTests -Prelease correctly installs all artifacts into a local repo:

[INFO] --- install:3.1.2:install (default-install) @ llama ---
[INFO] Installing /home/shutty/private/code/java-llama.cpp/pom.xml to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0.pom
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0.jar to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0.jar
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0-sources.jar to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0-sources.jar
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0-javadoc.jar to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0-javadoc.jar
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0-cuda12-linux-x86-64.jar to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0-cuda12-linux-x86-64.jar
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0.jar.asc to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0.jar.asc
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0.pom.asc to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0.pom.asc
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0-sources.jar.asc to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0-sources.jar.asc
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0-javadoc.jar.asc to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0-javadoc.jar.asc
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0-cuda12-linux-x86-64.jar.asc to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0-cuda12-linux-x86-64.jar.asc
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  3.008 s
[INFO] Finished at: 2024-09-03T21:52:22+02:00
[INFO] ------------------------------------------------------------------------

@shuttie shuttie marked this pull request as ready for review September 3, 2024 19:53
@kherud
Copy link
Owner

kherud commented Sep 3, 2024

Amazing work, thank you! I hope I'll find the time to review and merge it tomorrow, but can't promise it. Definitely by thursday though. If you want I can also add you as a collaborator.

@kherud
Copy link
Owner

kherud commented Sep 5, 2024

I checked the code and it looks good to me. The CUDA build seems to correctly work. When I run mvn clean install -DskipTests -Prelease locally on my machine (after building the libraries), the two JARs are also built, but llama-3.3.0-cuda12-linux-x86-64.jar seems to miss the required resources.

[target]$ ls -lh
total 3.2M
drwxr-xr-x 6 konstantin wheel 4.0K Sep  5 16:32 apidocs
drwxr-xr-x 3 konstantin wheel 4.0K Sep  5 16:32 classes
drwxr-xr-x 3 konstantin wheel 4.0K Sep  5 16:32 classes_cuda
drwxr-xr-x 3 konstantin wheel 4.0K Sep  5 16:32 generated-sources
drwxr-xr-x 3 konstantin wheel 4.0K Sep  5 16:32 generated-test-sources
drwxr-xr-x 2 konstantin wheel 4.0K Sep  5 16:32 javadoc-bundle-options
-rw-r--r-- 1 konstantin wheel  34K Sep  5 16:32 llama-3.3.0-cuda12-linux-x86-64.jar
-rw-r--r-- 1 konstantin wheel 176K Sep  5 16:32 llama-3.3.0-javadoc.jar
-rw-r--r-- 1 konstantin wheel 1.5M Sep  5 16:32 llama-3.3.0-sources.jar
-rw-r--r-- 1 konstantin wheel 1.5M Sep  5 16:32 llama-3.3.0.jar
-rw-r--r-- 1 konstantin wheel 5.5K Sep  5 16:32 llama-3.3.0.pom
drwxr-xr-x 2 konstantin wheel 4.0K Sep  5 16:32 maven-archiver
-rw-r--r-- 1 konstantin wheel 2.4K Sep  5 16:32 maven-javadoc-plugin-stale-data.txt
drwxr-xr-x 3 konstantin wheel 4.0K Sep  5 16:32 maven-status
drwxr-xr-x 4 konstantin wheel 4.0K Sep  5 16:32 test-classes

Note the 34K file size at llama-3.3.0-cuda12-linux-x86-64.jar. Does this work on your machine?

@shuttie
Copy link
Contributor Author

shuttie commented Sep 6, 2024

So in the CI release.yml process now there is an extra step to build the cuda artifacts, the build-linux-cuda step.

To run the whole thing locally for Linux, you need to run the build process twice:

  1. .github/dockcross/dockcross-manylinux_2_28-x64 .github/build_cuda_linux.sh "-DOS_NAME=Linux -DOS_ARCH=x86_64" which will produce the src/main/resources_linux_cuda directory with CUDA libs.
  2. rm -rf build to drop the CMake build cache files, as they're going to be borked if you attempt another build but without CUDA.
  3. .github/dockcross/dockcross-manylinux2014-x64 .github/build.sh "-DOS_NAME=Linux -DOS_ARCH=x86_64" which will build the CPU native libs in src/main/resources directory
  4. After these steps you can do the mvn clean install -DskipTests -Prelease which will build two jars for CPU and GPU.

@shuttie
Copy link
Contributor Author

shuttie commented Sep 6, 2024

Ah yes, reproduced your issue - there was a typo in pom.xml. After the fix it seems to be working fine locally:

total 130192
drwxr-xr-x 6 shutty shutty      4096 Sep  6 17:05 apidocs
drwxr-xr-x 3 shutty shutty      4096 Sep  6 17:05 classes
drwxr-xr-x 3 shutty shutty      4096 Sep  6 17:05 classes_cuda
drwxr-xr-x 3 shutty shutty      4096 Sep  6 17:05 generated-sources
drwxr-xr-x 3 shutty shutty      4096 Sep  6 17:05 generated-test-sources
drwxr-xr-x 2 shutty shutty      4096 Sep  6 17:05 javadoc-bundle-options
-rw-r--r-- 1 shutty shutty 129890247 Sep  6 17:05 llama-3.3.0-cuda12-linux-x86-64.jar
-rw-r--r-- 1 shutty shutty       488 Sep  6 17:05 llama-3.3.0-cuda12-linux-x86-64.jar.asc
-rw-r--r-- 1 shutty shutty   1580101 Sep  6 17:05 llama-3.3.0.jar
-rw-r--r-- 1 shutty shutty       488 Sep  6 17:05 llama-3.3.0.jar.asc
-rw-r--r-- 1 shutty shutty    192864 Sep  6 17:05 llama-3.3.0-javadoc.jar
-rw-r--r-- 1 shutty shutty       488 Sep  6 17:05 llama-3.3.0-javadoc.jar.asc
-rw-r--r-- 1 shutty shutty      5564 Sep  6 17:05 llama-3.3.0.pom
-rw-r--r-- 1 shutty shutty       488 Sep  6 17:05 llama-3.3.0.pom.asc
-rw-r--r-- 1 shutty shutty   1568905 Sep  6 17:05 llama-3.3.0-sources.jar
-rw-r--r-- 1 shutty shutty       488 Sep  6 17:05 llama-3.3.0-sources.jar.asc
drwxr-xr-x 2 shutty shutty      4096 Sep  6 17:05 maven-archiver
-rw-r--r-- 1 shutty shutty      2618 Sep  6 17:05 maven-javadoc-plugin-stale-data.txt
drwxr-xr-x 3 shutty shutty      4096 Sep  6 17:05 maven-status
drwxr-xr-x 4 shutty shutty      4096 Sep  6 17:05 test-classes

@kherud
Copy link
Owner

kherud commented Sep 6, 2024

Works like a charm 👌

@kherud kherud merged commit a14aa3f into kherud:master Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants