Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[pull] main from abetlen:main #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 4, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 49 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,15 @@ This will also build `llama.cpp` from source and install it alongside this pytho

If this fails, add `--verbose` to the `pip install` see the full cmake build log.

**Pre-built Wheel (New)**

It is also possible to install a pre-built wheel with basic CPU support.

```bash
pip install llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
```

### Installation Configuration

`llama.cpp` supports a number of hardware acceleration backends to speed up inference as well as backend specific options. See the [llama.cpp README](https://github.com/ggerganov/llama.cpp#build) for a full list.
Expand Down Expand Up @@ -100,12 +109,36 @@ CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-
</details>

<details>
<summary>cuBLAS (CUDA)</summary>
<summary>CUDA</summary>

To install with CUDA support, set the `LLAMA_CUDA=on` environment variable before installing:

```bash
CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python
```

**Pre-built Wheel (New)**

It is also possible to install a pre-built wheel with CUDA support. As long as your system meets some requirements:

- CUDA Version is 12.1, 12.2 or 12.3
- Python Version is 3.10, 3.11 or 3.12

To install with cuBLAS, set the `LLAMA_CUBLAS=on` environment variable before installing:
```bash
pip install llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/<cuda-version>
```

Where `<cuda-version>` is one of the following:
- `cu121`: CUDA 12.1
- `cu122`: CUDA 12.2
- `cu123`: CUDA 12.3

For example, to install the CUDA 12.1 wheel:

```bash
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
pip install llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
```

</details>
Expand All @@ -119,6 +152,18 @@ To install with Metal (MPS), set the `LLAMA_METAL=on` environment variable befor
CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
```

**Pre-built Wheel (New)**

It is also possible to install a pre-built wheel with Metal support. As long as your system meets some requirements:

- MacOS Version is 11.0 or later
- Python Version is 3.10, 3.11 or 3.12

```bash
pip install llama-cpp-python \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal
```

</details>
<details>

Expand Down Expand Up @@ -569,7 +614,7 @@ python3 -m llama_cpp.server --model models/7B/llama-model.gguf
Similar to Hardware Acceleration section above, you can also install with GPU (cuBLAS) support like this:

```bash
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install 'llama-cpp-python[server]'
CMAKE_ARGS="-DLLAMA_CUDA=on" FORCE_CMAKE=1 pip install 'llama-cpp-python[server]'
python3 -m llama_cpp.server --model models/7B/llama-model.gguf --n_gpu_layers 35
```

Expand Down
Loading