From c50309e52ab90ffc71ebfc083f7397d8d79851eb Mon Sep 17 00:00:00 2001 From: Andrei Betlen Date: Thu, 4 Apr 2024 02:49:19 -0400 Subject: [PATCH 1/3] docs: LLAMA_CUBLAS -> LLAMA_CUDA --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index f251f45f9..5e16614f2 100644 --- a/README.md +++ b/README.md @@ -102,10 +102,10 @@ CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-
cuBLAS (CUDA) -To install with cuBLAS, set the `LLAMA_CUBLAS=on` environment variable before installing: +To install with cuBLAS, set the `LLAMA_CUDA=on` environment variable before installing: ```bash -CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python +CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python ```
@@ -569,7 +569,7 @@ python3 -m llama_cpp.server --model models/7B/llama-model.gguf Similar to Hardware Acceleration section above, you can also install with GPU (cuBLAS) support like this: ```bash -CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install 'llama-cpp-python[server]' +CMAKE_ARGS="-DLLAMA_CUDA=on" FORCE_CMAKE=1 pip install 'llama-cpp-python[server]' python3 -m llama_cpp.server --model models/7B/llama-model.gguf --n_gpu_layers 35 ``` From 1db3b58fdc7c216749401a2901fc64495032f1ff Mon Sep 17 00:00:00 2001 From: Andrei Betlen Date: Thu, 4 Apr 2024 02:57:06 -0400 Subject: [PATCH 2/3] docs: Add docs explaining how to install pre-built wheels. --- README.md | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/README.md b/README.md index 5e16614f2..a0ef83ccf 100644 --- a/README.md +++ b/README.md @@ -44,6 +44,15 @@ This will also build `llama.cpp` from source and install it alongside this pytho If this fails, add `--verbose` to the `pip install` see the full cmake build log. +**Pre-built Wheel (New)** + +It is also possible to install a pre-built wheel with basic CPU support. + +```bash +pip install llama-cpp-python \ + --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu +``` + ### Installation Configuration `llama.cpp` supports a number of hardware acceleration backends to speed up inference as well as backend specific options. See the [llama.cpp README](https://github.com/ggerganov/llama.cpp#build) for a full list. @@ -108,6 +117,30 @@ To install with cuBLAS, set the `LLAMA_CUDA=on` environment variable before inst CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python ``` +**Pre-built Wheel (New)** + +It is also possible to install a pre-built wheel with CUDA support. As long as your system meets some requirements: + +- CUDA Version is 12.1, 12.2 or 12.3 +- Python Version is 3.10, 3.11 or 3.12 + +```bash +pip install llama-cpp-python \ + --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/ +``` + +Where `` is one of the following: +- `cu121`: CUDA 12.1 +- `cu122`: CUDA 12.2 +- `cu123`: CUDA 12.3 + +For example, to install the CUDA 12.1 wheel: + +```bash +pip install llama-cpp-python \ + --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121 +``` +
@@ -119,6 +152,18 @@ To install with Metal (MPS), set the `LLAMA_METAL=on` environment variable befor CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python ``` +**Pre-built Wheel (New)** + +It is also possible to install a pre-built wheel with Metal support. As long as your system meets some requirements: + +- MacOS Version is 11.0 or later +- Python Version is 3.10, 3.11 or 3.12 + +```bash +pip install llama-cpp-python \ + --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal +``` +
From 909ef66951fb9f3f5c724841b0c98a598a2e0802 Mon Sep 17 00:00:00 2001 From: Andrei Betlen Date: Thu, 4 Apr 2024 03:08:47 -0400 Subject: [PATCH 3/3] docs: Rename cuBLAS section to CUDA --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index a0ef83ccf..c4e194bfa 100644 --- a/README.md +++ b/README.md @@ -109,9 +109,9 @@ CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-
-cuBLAS (CUDA) +CUDA -To install with cuBLAS, set the `LLAMA_CUDA=on` environment variable before installing: +To install with CUDA support, set the `LLAMA_CUDA=on` environment variable before installing: ```bash CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python