From c50309e52ab90ffc71ebfc083f7397d8d79851eb Mon Sep 17 00:00:00 2001
From: Andrei Betlen <abetlen@gmail.com>
Date: Thu, 4 Apr 2024 02:49:19 -0400
Subject: [PATCH 1/3] docs: LLAMA_CUBLAS -> LLAMA_CUDA

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index f251f45f9..5e16614f2 100644
--- a/README.md
+++ b/README.md
@@ -102,10 +102,10 @@ CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-
 <details>
 <summary>cuBLAS (CUDA)</summary>
 
-To install with cuBLAS, set the `LLAMA_CUBLAS=on` environment variable before installing:
+To install with cuBLAS, set the `LLAMA_CUDA=on` environment variable before installing:
 
 ```bash
-CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
+CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python
 ```
 
 </details>
@@ -569,7 +569,7 @@ python3 -m llama_cpp.server --model models/7B/llama-model.gguf
 Similar to Hardware Acceleration section above, you can also install with GPU (cuBLAS) support like this:
 
 ```bash
-CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install 'llama-cpp-python[server]'
+CMAKE_ARGS="-DLLAMA_CUDA=on" FORCE_CMAKE=1 pip install 'llama-cpp-python[server]'
 python3 -m llama_cpp.server --model models/7B/llama-model.gguf --n_gpu_layers 35
 ```
 

From 1db3b58fdc7c216749401a2901fc64495032f1ff Mon Sep 17 00:00:00 2001
From: Andrei Betlen <abetlen@gmail.com>
Date: Thu, 4 Apr 2024 02:57:06 -0400
Subject: [PATCH 2/3] docs: Add docs explaining how to install pre-built
 wheels.

---
 README.md | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/README.md b/README.md
index 5e16614f2..a0ef83ccf 100644
--- a/README.md
+++ b/README.md
@@ -44,6 +44,15 @@ This will also build `llama.cpp` from source and install it alongside this pytho
 
 If this fails, add `--verbose` to the `pip install` see the full cmake build log.
 
+**Pre-built Wheel (New)**
+
+It is also possible to install a pre-built wheel with basic CPU support.
+
+```bash
+pip install llama-cpp-python \
+  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
+```
+
 ### Installation Configuration
 
 `llama.cpp` supports a number of hardware acceleration backends to speed up inference as well as backend specific options. See the [llama.cpp README](https://github.com/ggerganov/llama.cpp#build) for a full list.
@@ -108,6 +117,30 @@ To install with cuBLAS, set the `LLAMA_CUDA=on` environment variable before inst
 CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python
 ```
 
+**Pre-built Wheel (New)**
+
+It is also possible to install a pre-built wheel with CUDA support. As long as your system meets some requirements:
+
+- CUDA Version is 12.1, 12.2 or 12.3
+- Python Version is 3.10, 3.11 or 3.12
+
+```bash
+pip install llama-cpp-python \
+  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/<cuda-version>
+```
+
+Where `<cuda-version>` is one of the following:
+- `cu121`: CUDA 12.1
+- `cu122`: CUDA 12.2
+- `cu123`: CUDA 12.3
+
+For example, to install the CUDA 12.1 wheel:
+
+```bash
+pip install llama-cpp-python \
+  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
+```
+
 </details>
 
 <details>
@@ -119,6 +152,18 @@ To install with Metal (MPS), set the `LLAMA_METAL=on` environment variable befor
 CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
 ```
 
+**Pre-built Wheel (New)**
+
+It is also possible to install a pre-built wheel with Metal support. As long as your system meets some requirements:
+
+- MacOS Version is 11.0 or later
+- Python Version is 3.10, 3.11 or 3.12
+
+```bash
+pip install llama-cpp-python \
+  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal
+```
+
 </details>
 <details>
 

From 909ef66951fb9f3f5c724841b0c98a598a2e0802 Mon Sep 17 00:00:00 2001
From: Andrei Betlen <abetlen@gmail.com>
Date: Thu, 4 Apr 2024 03:08:47 -0400
Subject: [PATCH 3/3] docs: Rename cuBLAS section to CUDA

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index a0ef83ccf..c4e194bfa 100644
--- a/README.md
+++ b/README.md
@@ -109,9 +109,9 @@ CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-
 </details>
 
 <details>
-<summary>cuBLAS (CUDA)</summary>
+<summary>CUDA</summary>
 
-To install with cuBLAS, set the `LLAMA_CUDA=on` environment variable before installing:
+To install with CUDA support, set the `LLAMA_CUDA=on` environment variable before installing:
 
 ```bash
 CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python