CUDA support for pre-built library #74

shuttie · 2024-09-03T11:09:13Z

This PR adds support for a CUDA 12 linked ggml library, so you can choose between CPU and GPU inference without building your own.

The packaging done via maven classifiers: the default JAR artifact is still the same, but there's an extra one with the cuda12-linux-x86-64 classifier, which has CUDA12 native library.

For build I created a separate build step doing extra setup:

manylinux2014 is too old for cuda12, so I had to upgrade to manylinux_2_28 - which is centos8 based.
the build_linux_cuda.sh script piggybacks on top of origial build.sh - I would like to avoid using custom docker image with dependencies installed to simplify future updates.
the maven setup is a bit hacky: I had to trigger compilation twice for regular build, and cuda-based. I found no way to easily have a JAR with classifier having custom resource directory.

CI: dockerized cuda12 build

kherud · 2024-09-03T11:24:14Z

Hey @shuttie really nice! I thought about adding CUDA support before, but held back because of the complexity. It would be really valuable though and I'd be happy to assist. I think the best option for a toggle between CPU and CUDA would be to use Maven classifiers, so users would get CPU by default, but could access CUDA support by something like

<dependency>
    <groupId>de.kherud</groupId>
    <artifactId>llama</artifactId>
    <version>3.3.0</version>
    <classifier>linux-x64-cuda</classifier>
</dependency>

This would solve two problems I think:

The CUDA shared libraries can become quite large and users who don't use CUDA wouldn't have to download them
No changes of ModelLoader would be necessary, the CUDA libraries would simply take the place of the original CPU libraries within resources/Linux/x86_64

shuttie · 2024-09-03T12:42:15Z

Yes, a good idea with classifiers - the resulting libggml.so is already 300Mb in size :)

shuttie · 2024-09-03T19:53:30Z

@kherud The PR is ready for review. I have local fork to validate that the CI changes are green: https://github.com/shuttie/java-llama.cpp/actions/runs/10688928505/job/29629874992:

I would like to have your support with future publishing - as I have no permissions for doing it in external repo, and cannot test if it works for multi-artifact builds. But I mostly sure that it should be OK - doing mvn clean install -DskipTests -Prelease correctly installs all artifacts into a local repo:

[INFO] --- install:3.1.2:install (default-install) @ llama ---
[INFO] Installing /home/shutty/private/code/java-llama.cpp/pom.xml to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0.pom
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0.jar to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0.jar
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0-sources.jar to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0-sources.jar
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0-javadoc.jar to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0-javadoc.jar
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0-cuda12-linux-x86-64.jar to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0-cuda12-linux-x86-64.jar
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0.jar.asc to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0.jar.asc
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0.pom.asc to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0.pom.asc
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0-sources.jar.asc to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0-sources.jar.asc
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0-javadoc.jar.asc to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0-javadoc.jar.asc
[INFO] Installing /home/shutty/private/code/java-llama.cpp/target/llama-3.3.0-cuda12-linux-x86-64.jar.asc to /home/shutty/.m2/repository/de/kherud/llama/3.3.0/llama-3.3.0-cuda12-linux-x86-64.jar.asc
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  3.008 s
[INFO] Finished at: 2024-09-03T21:52:22+02:00
[INFO] ------------------------------------------------------------------------

kherud · 2024-09-03T20:48:51Z

Amazing work, thank you! I hope I'll find the time to review and merge it tomorrow, but can't promise it. Definitely by thursday though. If you want I can also add you as a collaborator.

kherud · 2024-09-05T14:38:25Z

I checked the code and it looks good to me. The CUDA build seems to correctly work. When I run mvn clean install -DskipTests -Prelease locally on my machine (after building the libraries), the two JARs are also built, but llama-3.3.0-cuda12-linux-x86-64.jar seems to miss the required resources.

[target]$ ls -lh
total 3.2M
drwxr-xr-x 6 konstantin wheel 4.0K Sep  5 16:32 apidocs
drwxr-xr-x 3 konstantin wheel 4.0K Sep  5 16:32 classes
drwxr-xr-x 3 konstantin wheel 4.0K Sep  5 16:32 classes_cuda
drwxr-xr-x 3 konstantin wheel 4.0K Sep  5 16:32 generated-sources
drwxr-xr-x 3 konstantin wheel 4.0K Sep  5 16:32 generated-test-sources
drwxr-xr-x 2 konstantin wheel 4.0K Sep  5 16:32 javadoc-bundle-options
-rw-r--r-- 1 konstantin wheel  34K Sep  5 16:32 llama-3.3.0-cuda12-linux-x86-64.jar
-rw-r--r-- 1 konstantin wheel 176K Sep  5 16:32 llama-3.3.0-javadoc.jar
-rw-r--r-- 1 konstantin wheel 1.5M Sep  5 16:32 llama-3.3.0-sources.jar
-rw-r--r-- 1 konstantin wheel 1.5M Sep  5 16:32 llama-3.3.0.jar
-rw-r--r-- 1 konstantin wheel 5.5K Sep  5 16:32 llama-3.3.0.pom
drwxr-xr-x 2 konstantin wheel 4.0K Sep  5 16:32 maven-archiver
-rw-r--r-- 1 konstantin wheel 2.4K Sep  5 16:32 maven-javadoc-plugin-stale-data.txt
drwxr-xr-x 3 konstantin wheel 4.0K Sep  5 16:32 maven-status
drwxr-xr-x 4 konstantin wheel 4.0K Sep  5 16:32 test-classes

Note the 34K file size at llama-3.3.0-cuda12-linux-x86-64.jar. Does this work on your machine?

shuttie · 2024-09-06T15:01:03Z

So in the CI release.yml process now there is an extra step to build the cuda artifacts, the build-linux-cuda step.

To run the whole thing locally for Linux, you need to run the build process twice:

.github/dockcross/dockcross-manylinux_2_28-x64 .github/build_cuda_linux.sh "-DOS_NAME=Linux -DOS_ARCH=x86_64" which will produce the src/main/resources_linux_cuda directory with CUDA libs.
rm -rf build to drop the CMake build cache files, as they're going to be borked if you attempt another build but without CUDA.
.github/dockcross/dockcross-manylinux2014-x64 .github/build.sh "-DOS_NAME=Linux -DOS_ARCH=x86_64" which will build the CPU native libs in src/main/resources directory
After these steps you can do the mvn clean install -DskipTests -Prelease which will build two jars for CPU and GPU.

shuttie · 2024-09-06T15:07:20Z

Ah yes, reproduced your issue - there was a typo in pom.xml. After the fix it seems to be working fine locally:

total 130192
drwxr-xr-x 6 shutty shutty      4096 Sep  6 17:05 apidocs
drwxr-xr-x 3 shutty shutty      4096 Sep  6 17:05 classes
drwxr-xr-x 3 shutty shutty      4096 Sep  6 17:05 classes_cuda
drwxr-xr-x 3 shutty shutty      4096 Sep  6 17:05 generated-sources
drwxr-xr-x 3 shutty shutty      4096 Sep  6 17:05 generated-test-sources
drwxr-xr-x 2 shutty shutty      4096 Sep  6 17:05 javadoc-bundle-options
-rw-r--r-- 1 shutty shutty 129890247 Sep  6 17:05 llama-3.3.0-cuda12-linux-x86-64.jar
-rw-r--r-- 1 shutty shutty       488 Sep  6 17:05 llama-3.3.0-cuda12-linux-x86-64.jar.asc
-rw-r--r-- 1 shutty shutty   1580101 Sep  6 17:05 llama-3.3.0.jar
-rw-r--r-- 1 shutty shutty       488 Sep  6 17:05 llama-3.3.0.jar.asc
-rw-r--r-- 1 shutty shutty    192864 Sep  6 17:05 llama-3.3.0-javadoc.jar
-rw-r--r-- 1 shutty shutty       488 Sep  6 17:05 llama-3.3.0-javadoc.jar.asc
-rw-r--r-- 1 shutty shutty      5564 Sep  6 17:05 llama-3.3.0.pom
-rw-r--r-- 1 shutty shutty       488 Sep  6 17:05 llama-3.3.0.pom.asc
-rw-r--r-- 1 shutty shutty   1568905 Sep  6 17:05 llama-3.3.0-sources.jar
-rw-r--r-- 1 shutty shutty       488 Sep  6 17:05 llama-3.3.0-sources.jar.asc
drwxr-xr-x 2 shutty shutty      4096 Sep  6 17:05 maven-archiver
-rw-r--r-- 1 shutty shutty      2618 Sep  6 17:05 maven-javadoc-plugin-stale-data.txt
drwxr-xr-x 3 shutty shutty      4096 Sep  6 17:05 maven-status
drwxr-xr-x 4 shutty shutty      4096 Sep  6 17:05 test-classes

kherud · 2024-09-06T18:21:32Z

Works like a charm 👌

CI: dockerized cuda12 build

06abbbd

shuttie marked this pull request as draft September 3, 2024 11:09

shuttie changed the title ~~CI: dockerized cuda12 build~~ CUDA support for pre-built library Sep 3, 2024

shuttie added 4 commits September 3, 2024 11:16

Merge pull request #1 from shuttie/cuda-support

40fda8b

CI: dockerized cuda12 build

set proper container ID for manylinux 2.28

53a8874

set proper container ID for manylinux 2.28

fb4f820

proper build script name for cuda

48087f2

shuttie added 3 commits September 3, 2024 13:34

use 20240812-60fa1b0 tag for manylinux - as old has expired repo keys

83ad69d

add cuda support detection

071001e

Merge branch 'master' into cuda-support

15e610b

shuttie added 7 commits September 3, 2024 18:26

use maven classifiers for cuda12 build

2f05986

use separate ci step for cuda

bdfeabc

set proper os name

5dcfb48

resource* => resources

b62d48e

use resources_linux_cuda as custom res name

95819f5

proper classifiers!

ecab8af

update docs

d1fadb2

shuttie marked this pull request as ready for review September 3, 2024 19:53

shuttie added 2 commits September 3, 2024 21:55

remove unneeded changes in workflow and dockercross update script

fa0dd1c

remove dockcross-linux-amd64-lts from update.sh

6530b79

fix cuda resource dir name in pom.xml

428153f

kherud merged commit a14aa3f into kherud:master Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA support for pre-built library #74

CUDA support for pre-built library #74

Uh oh!

shuttie commented Sep 3, 2024 •

edited

Loading

Uh oh!

kherud commented Sep 3, 2024

Uh oh!

shuttie commented Sep 3, 2024

Uh oh!

shuttie commented Sep 3, 2024

Uh oh!

kherud commented Sep 3, 2024

Uh oh!

kherud commented Sep 5, 2024

Uh oh!

shuttie commented Sep 6, 2024

Uh oh!

shuttie commented Sep 6, 2024

Uh oh!

kherud commented Sep 6, 2024

Uh oh!

Uh oh!

CUDA support for pre-built library #74

CUDA support for pre-built library #74

Uh oh!

Conversation

shuttie commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kherud commented Sep 3, 2024

Uh oh!

shuttie commented Sep 3, 2024

Uh oh!

shuttie commented Sep 3, 2024

Uh oh!

kherud commented Sep 3, 2024

Uh oh!

kherud commented Sep 5, 2024

Uh oh!

shuttie commented Sep 6, 2024

Uh oh!

shuttie commented Sep 6, 2024

Uh oh!

kherud commented Sep 6, 2024

Uh oh!

Uh oh!

shuttie commented Sep 3, 2024 •

edited

Loading