You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+91-33
Original file line number
Diff line number
Diff line change
@@ -25,105 +25,162 @@ Documentation is available at [https://llama-cpp-python.readthedocs.io/en/latest
25
25
26
26
## Installation
27
27
28
-
`llama-cpp-python` can be installed directly from PyPI as a source distribution by running:
28
+
Requirements:
29
+
30
+
- Python 3.8+
31
+
- C compiler
32
+
- Linux: gcc or clang
33
+
- Windows: Visual Studio or MinGW
34
+
- MacOS: Xcode
35
+
36
+
To install the package, run:
29
37
30
38
```bash
31
39
pip install llama-cpp-python
32
40
```
33
41
34
-
This will build `llama.cpp` from source using cmake and your system's c compiler (required) and install the library alongside this python package.
42
+
This will also build `llama.cpp` from source and install it alongside this python package.
35
43
36
-
If you run into issues during installation add the `--verbose`flag to the `pip install` command to see the full cmake build log.
44
+
If this fails, add `--verbose` to the `pip install` see the full cmake build log.
37
45
38
-
### Installation with Specific Hardware Acceleration (BLAS, CUDA, Metal, etc)
46
+
### Installation Configuration
39
47
40
-
The default pip install behaviour is to build `llama.cpp` for CPU only on Linux and Windows and use Metal on MacOS.
48
+
`llama.cpp` supports a number of hardware acceleration backends to speed up inference as well as backend specific options. See the [llama.cpp README](https://github.com/ggerganov/llama.cpp#build) for a full list.
41
49
42
-
`llama.cpp` supports a number of hardware acceleration backends depending including OpenBLAS, cuBLAS, CLBlast, HIPBLAS, and Metal.
43
-
See the [llama.cpp README](https://github.com/ggerganov/llama.cpp#build) for a full list of supported backends.
50
+
All `llama.cpp` cmake build options can be set via the `CMAKE_ARGS` environment variable or via the `--config-settings / -C` cli flag during installation.
44
51
45
-
All of these backends are supported by `llama-cpp-python` and can be enabled by setting the `CMAKE_ARGS` environment variable before installing.
46
-
47
-
On Linux and Mac you set the `CMAKE_ARGS` like this:
<summary>Error: Can't find 'nmake' or 'CMAKE_C_COMPILER'</summary>
183
+
127
184
If you run into issues where it complains it can't find `'nmake'``'?'` or CMAKE_C_COMPILER, you can extract w64devkit as [mentioned in llama.cpp repo](https://github.com/ggerganov/llama.cpp#openblas) and add those manually to CMAKE_ARGS before running `pip` install:
See the above instructions and set `CMAKE_ARGS` to the BLAS backend you want to use.
192
+
</details>
135
193
136
194
### MacOS Notes
137
195
138
196
Detailed MacOS Metal GPU install documentation is available at [docs/install/macos.md](https://llama-cpp-python.readthedocs.io/en/latest/install/macos/)
139
197
140
-
#### M1 Mac Performance Issue
198
+
<details>
199
+
<summary>M1 Mac Performance Issue</summary>
141
200
142
201
Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example:
This will ensure that all source files are re-built with the most recently set `CMAKE_ARGS` flags.
223
+
To upgrade and rebuild `llama-cpp-python` add `--upgrade --force-reinstall --no-cache-dir` flags to the `pip install` command to ensure the package is rebuilt from source.
168
224
169
225
## High-level API
170
226
@@ -218,13 +274,15 @@ You can pull `Llama` models from Hugging Face using the `from_pretrained` method
218
274
You'll need to install the `huggingface-hub` package to use this feature (`pip install huggingface-hub`).
219
275
220
276
```python
221
-
llama= Llama.from_pretrained(
277
+
llm= Llama.from_pretrained(
222
278
repo_id="Qwen/Qwen1.5-0.5B-Chat-GGUF",
223
279
filename="*q8_0.gguf",
224
280
verbose=False
225
281
)
226
282
```
227
283
284
+
By default the `from_pretrained` method will download the model to the huggingface cache directory so you can manage installed model files with the `huggingface-cli` tool.
285
+
228
286
### Chat Completion
229
287
230
288
The high-level API also provides a simple interface for chat completion.
0 commit comments