You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-35Lines changed: 5 additions & 35 deletions
Original file line number
Diff line number
Diff line change
@@ -16,9 +16,6 @@ Inference of Meta's LLaMA model (and others) in pure C/C++.
16
16
2.3 [Infilling](#infilling)
17
17
3.[Android](#importing-in-android)
18
18
19
-
> [!NOTE]
20
-
> Now with support for Llama 3, Phi-3, and flash attention
21
-
22
19
## Quick Start
23
20
24
21
Access this library via Maven:
@@ -27,18 +24,7 @@ Access this library via Maven:
27
24
<dependency>
28
25
<groupId>de.kherud</groupId>
29
26
<artifactId>llama</artifactId>
30
-
<version>3.4.1</version>
31
-
</dependency>
32
-
```
33
-
34
-
Bu default the default library artifact is built only with CPU inference support. To enable CUDA, use a `cuda12-linux-x86-64` maven classifier:
35
-
36
-
```xml
37
-
<dependency>
38
-
<groupId>de.kherud</groupId>
39
-
<artifactId>llama</artifactId>
40
-
<version>3.4.1</version>
41
-
<classifier>cuda12-linux-x86-64</classifier>
27
+
<version>4.0.0</version>
42
28
</dependency>
43
29
```
44
30
@@ -50,11 +36,7 @@ We support CPU inference for the following platforms out of the box:
50
36
51
37
- Linux x86-64, aarch64
52
38
- MacOS x86-64, aarch64 (M-series)
53
-
- Windows x86-64, x64, arm (32 bit)
54
-
55
-
For GPU inference, we support:
56
-
57
-
- Linux x86-64 with CUDA 12.1+
39
+
- Windows x86-64, x64
58
40
59
41
If any of these match your platform, you can include the Maven dependency and get started.
60
42
@@ -88,13 +70,9 @@ All compiled libraries will be put in a resources directory matching your platfo
88
70
89
71
#### Library Location
90
72
91
-
This project has to load three shared libraries:
73
+
This project has to load a single shared library `jllama`.
92
74
93
-
- ggml
94
-
- llama
95
-
- jllama
96
-
97
-
Note, that the file names vary between operating systems, e.g., `ggml.dll` on Windows, `libggml.so` on Linux, and `libggml.dylib` on macOS.
75
+
Note, that the file name varies between operating systems, e.g., `jllama.dll` on Windows, `jllama.so` on Linux, and `jllama.dylib` on macOS.
98
76
99
77
The application will search in the following order in the following locations:
100
78
@@ -105,14 +83,6 @@ The application will search in the following order in the following locations:
105
83
- From the **JAR**: If any of the libraries weren't found yet, the application will try to use a prebuilt shared library.
106
84
This of course only works for the [supported platforms](#no-setup-required) .
107
85
108
-
Not all libraries have to be in the same location.
109
-
For example, if you already have a llama.cpp and ggml version you can install them as a system library and rely on the jllama library from the JAR.
110
-
This way, you don't have to compile anything.
111
-
112
-
#### CUDA
113
-
114
-
On Linux x86-64 with CUDA 12.1+, the library assumes that your CUDA libraries are findable in `java.library.path`. If you have CUDA installed in a non-standard location, then point the `java.library.path` to the directory containing the `libcudart.so.12` library.
0 commit comments