diff --git a/README.md b/README.md index f251f45f9..c4e194bfa 100644 --- a/README.md +++ b/README.md @@ -44,6 +44,15 @@ This will also build `llama.cpp` from source and install it alongside this pytho If this fails, add `--verbose` to the `pip install` see the full cmake build log. +**Pre-built Wheel (New)** + +It is also possible to install a pre-built wheel with basic CPU support. + +```bash +pip install llama-cpp-python \ + --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu +``` + ### Installation Configuration `llama.cpp` supports a number of hardware acceleration backends to speed up inference as well as backend specific options. See the [llama.cpp README](https://github.com/ggerganov/llama.cpp#build) for a full list. @@ -100,12 +109,36 @@ CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-
-cuBLAS (CUDA) +CUDA + +To install with CUDA support, set the `LLAMA_CUDA=on` environment variable before installing: + +```bash +CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python +``` + +**Pre-built Wheel (New)** + +It is also possible to install a pre-built wheel with CUDA support. As long as your system meets some requirements: + +- CUDA Version is 12.1, 12.2 or 12.3 +- Python Version is 3.10, 3.11 or 3.12 -To install with cuBLAS, set the `LLAMA_CUBLAS=on` environment variable before installing: +```bash +pip install llama-cpp-python \ + --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/ +``` + +Where `` is one of the following: +- `cu121`: CUDA 12.1 +- `cu122`: CUDA 12.2 +- `cu123`: CUDA 12.3 + +For example, to install the CUDA 12.1 wheel: ```bash -CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python +pip install llama-cpp-python \ + --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121 ```
@@ -119,6 +152,18 @@ To install with Metal (MPS), set the `LLAMA_METAL=on` environment variable befor CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python ``` +**Pre-built Wheel (New)** + +It is also possible to install a pre-built wheel with Metal support. As long as your system meets some requirements: + +- MacOS Version is 11.0 or later +- Python Version is 3.10, 3.11 or 3.12 + +```bash +pip install llama-cpp-python \ + --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal +``` +
@@ -569,7 +614,7 @@ python3 -m llama_cpp.server --model models/7B/llama-model.gguf Similar to Hardware Acceleration section above, you can also install with GPU (cuBLAS) support like this: ```bash -CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install 'llama-cpp-python[server]' +CMAKE_ARGS="-DLLAMA_CUDA=on" FORCE_CMAKE=1 pip install 'llama-cpp-python[server]' python3 -m llama_cpp.server --model models/7B/llama-model.gguf --n_gpu_layers 35 ```