Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[pull] main from abetlen:main #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 361 commits into from
Feb 28, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
361 commits
Select commit Hold shift + click to select a range
7316502
chore: Bump version
abetlen May 10, 2024
7f59856
fix: Enable CUDA backend for llava. Closes #1324
abetlen May 10, 2024
1547202
docs: Fix typo in README.md (#1444)
yupbank May 10, 2024
9dc5e20
feat: Update llama.cpp
abetlen May 12, 2024
3fe8e9a
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…
abetlen May 12, 2024
3c19faa
chore: Bump version
abetlen May 12, 2024
3f8e17a
fix(ci): Use version without extra platform tag in pep503 index
abetlen May 12, 2024
43ba152
feat: Update llama.cpp
abetlen May 13, 2024
50f5c74
Update llama.cpp
abetlen May 14, 2024
4b54f79
chore(deps): bump pypa/cibuildwheel from 2.17.0 to 2.18.0 (#1453)
dependabot[bot] May 14, 2024
389e09c
misc: Remove unnecessary metadata lookups (#1448)
CISC May 14, 2024
5212fb0
feat: add MinTokensLogitProcessor and min_tokens argument to server (…
twaka May 14, 2024
ca8e3c9
feat: Update llama.cpp
abetlen May 16, 2024
e811a81
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…
abetlen May 16, 2024
d99a6ba
fix: segfault for models without eos / bos tokens. Closes #1463
abetlen May 16, 2024
b564d05
chore: Bump version
abetlen May 16, 2024
03f171e
example: LLM inference with Ray Serve (#1465)
rgerganov May 17, 2024
d8a3b01
feat: Update llama.cpp
abetlen May 18, 2024
3dbfec7
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…
abetlen May 18, 2024
5a595f0
feat: Update llama.cpp
abetlen May 22, 2024
087cc0b
feat: Update llama.cpp
abetlen May 24, 2024
5cae104
feat: Improve Llama.eval performance by avoiding list conversion (#1476)
thoughtp0lice May 24, 2024
a4c9ab8
chore: Bump version
abetlen May 24, 2024
ec43e89
docs: Update multi-modal model section
abetlen May 24, 2024
9e8d7d5
fix(docs): Fix link typo
abetlen May 24, 2024
2d89964
docs: Fix table formatting
abetlen May 24, 2024
454c9bb
feat: Update llama.cpp
abetlen May 27, 2024
c564007
chore(deps): bump pypa/cibuildwheel from 2.18.0 to 2.18.1 (#1472)
dependabot[bot] May 27, 2024
c26004b
feat: Update llama.cpp
abetlen May 29, 2024
2907c26
misc: Update debug build to keep all debug symbols for easier gdb deb…
abetlen May 29, 2024
10b7c50
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…
abetlen May 29, 2024
df45a4b
fix: fix string value kv_overrides. Closes #1487
abetlen May 29, 2024
91d05ab
fix: adjust kv_override member names to match llama.cpp
abetlen May 29, 2024
165b4dc
fix: Fix typo in Llama3VisionAlphaChatHandler. Closes #1488
abetlen May 29, 2024
af3ed50
fix: Use numpy recarray for candidates data, fixes bug with temp < 0
abetlen Jun 1, 2024
a6457ba
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…
abetlen Jun 1, 2024
6b018e0
misc: Improve llava error messages
abetlen Jun 3, 2024
cd3f1bb
feat: Update llama.cpp
abetlen Jun 4, 2024
ae5682f
fix: Disable Windows+CUDA workaround when compiling for HIPBLAS (#1493)
Engininja2 Jun 4, 2024
c3ef41b
chore: Bump version
abetlen Jun 4, 2024
951e39c
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…
abetlen Jun 4, 2024
027f7bc
fix: Avoid duplicate special tokens in chat formats (#1439)
CISC Jun 4, 2024
6e0642c
fix: fix logprobs when BOS is not present (#1471)
a-ghorbani Jun 4, 2024
d634efc
feat: adding `rpc_servers` parameter to `Llama` class (#1477)
chraac Jun 4, 2024
255e1b4
feat: Update llama.cpp
abetlen Jun 7, 2024
83d6b26
feat: Update llama.cpp
abetlen Jun 9, 2024
1615eb9
feat: Update llama.cpp
abetlen Jun 10, 2024
86a38ad
chore: Bump version
abetlen Jun 10, 2024
e342161
feat: Update llama.cpp
abetlen Jun 13, 2024
dbcf64c
feat: Support SPM infill (#1492)
CISC Jun 13, 2024
320a5d7
feat: Add `.close()` method to `Llama` class to explicitly free model…
jkawamoto Jun 13, 2024
5af8163
chore(deps): bump pypa/cibuildwheel from 2.18.1 to 2.19.0 (#1522)
dependabot[bot] Jun 13, 2024
9e396b3
feat: Update workflows and pre-built wheels (#1416)
Smartappli Jun 13, 2024
8401c6f
feat: Update llama.cpp
abetlen Jun 13, 2024
f4491c4
feat: Update llama.cpp
abetlen Jun 17, 2024
4c1d74c
fix: Make destructor to automatically call .close() method on Llama c…
abetlen Jun 19, 2024
554fd08
feat: Update llama.cpp
abetlen Jun 19, 2024
6c33190
chore: Bump version
abetlen Jun 19, 2024
d98a24a
docs: Remove references to deprecated opencl backend. Closes #1512
abetlen Jun 20, 2024
5beec1a
feat: Update llama.cpp
abetlen Jun 21, 2024
27d5358
docs: Update readme examples to use newer Qwen2 model (#1544)
jncraton Jun 21, 2024
398fe81
chore(deps): bump docker/build-push-action from 5 to 6 (#1539)
dependabot[bot] Jun 21, 2024
35c980e
chore(deps): bump pypa/cibuildwheel from 2.18.1 to 2.19.1 (#1527)
dependabot[bot] Jun 21, 2024
04959f1
feat: Update llama_cpp.py bindings
abetlen Jun 21, 2024
117cbb2
feat: Update llama.cpp
abetlen Jul 2, 2024
bf5e0bb
fix(server): Update `embeddings=False` by default. Embeddings should …
abetlen Jul 2, 2024
73ddf29
fix(ci): Fix the CUDA workflow (#1551)
oobabooga Jul 2, 2024
c546c94
misc: Install shared libraries to lib subdirectory
abetlen Jul 2, 2024
92bad6e
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…
abetlen Jul 2, 2024
139774b
fix: Update shared library rpath
abetlen Jul 2, 2024
d5f6a15
fix: force $ORIGIN rpath for shared library files
abetlen Jul 2, 2024
e51f200
fix: Fix installation location for shared libraries
abetlen Jul 2, 2024
73fe013
fix: Fix RPATH so it works on macos
abetlen Jul 2, 2024
dc20e8c
fix: Copy dependencies for windows
abetlen Jul 2, 2024
296304b
fix(server): Fix bug in FastAPI streaming response where dependency w…
abetlen Jul 2, 2024
bd5d17b
feat: Update llama.cpp
abetlen Jul 2, 2024
b4cc923
chore: Bump version
abetlen Jul 2, 2024
4fb6fc1
fix(ci): Use LLAMA_CUDA for cuda wheels
abetlen Jul 2, 2024
387d01d
fix(misc): Fix type errors
abetlen Jul 2, 2024
8992a1a
feat: Update llama.cpp
abetlen Jul 2, 2024
3a551eb
fix(ci): Update macos image (macos-11 is removed)
abetlen Jul 2, 2024
01bddd6
chore: Bump version
abetlen Jul 2, 2024
7e20e34
feat: Update llama.cpp
abetlen Jul 4, 2024
62804ee
feat: Update llama.cpp
abetlen Jul 6, 2024
157d913
fix: update token_to_piece
abetlen Jul 6, 2024
218d361
feat: Update llama.cpp
abetlen Jul 9, 2024
1a55417
fix: Update LLAMA_ flags to GGML_ flags
abetlen Jul 9, 2024
09a4f78
fix(ci): Update LLAMA_ flags to GGML_
abetlen Jul 9, 2024
0481a3a
fix(docs): Update LLAMA_ flags to GGML_ flags
abetlen Jul 9, 2024
fccff80
fix(docs): Remove kompute backend references
abetlen Jul 9, 2024
276ea28
fix(misc): Update LLAMA_ flags to GGML_
abetlen Jul 9, 2024
aaf4cbe
chore: Bump version
abetlen Jul 9, 2024
14760c6
chore(deps): bump pypa/cibuildwheel from 2.19.1 to 2.19.2 (#1568)
dependabot[bot] Jul 9, 2024
e31f096
chore(deps): bump microsoft/setup-msbuild from 1.1 to 1.3 (#1569)
dependabot[bot] Jul 9, 2024
b77e507
feat(ci): Dockerfile update base images and post-install cleanup (#1530)
Smartappli Jul 9, 2024
c1ae815
fix(misc): Format
abetlen Jul 9, 2024
08f2bb3
fix(minor): Minor ruff fixes
abetlen Jul 9, 2024
f7f4fa8
feat(ci): Update simple Dockerfile (#1459)
yentur Jul 9, 2024
7613d23
feat: Update llama.cpp
abetlen Jul 17, 2024
66d5cdd
fix(server): Use split_mode from model settings (#1594)
grider-withourai Jul 17, 2024
797f54c
fix(docs): Update README.md typo (#1589)
ericcurtin Jul 17, 2024
0700476
fix: Change repeat_penalty to 1.0 to match llama.cpp defaults (#1590)
ddh0 Jul 18, 2024
3638f73
feat: Add 'required' literal to ChatCompletionToolChoiceOption (#1597)
mjschock Jul 18, 2024
f95057a
chore(deps): bump microsoft/setup-msbuild from 1.3 to 2 (#1585)
dependabot[bot] Jul 20, 2024
5105f40
feat: Update llama.cpp
abetlen Jul 22, 2024
816d491
chore: Bump version
abetlen Jul 22, 2024
a14b49d
feat: Update llama.cpp
abetlen Jul 24, 2024
dccb148
feat: Update llama.cpp
abetlen Jul 28, 2024
9ed6b27
fix: Correcting run.sh filepath in Simple Docker implementation (#1626)
mashuk999 Jul 28, 2024
4bf3b43
chore: Bump version
abetlen Jul 28, 2024
cffb4ec
feat: Update llama.cpp
abetlen Jul 31, 2024
53c6f32
feat: Update llama.cpp
abetlen Jul 31, 2024
0b1a8d8
feat: FreeBSD compatibility (#1635)
yurivict Jul 31, 2024
8297a0d
fix(docker): Update Dockerfile build options from `LLAMA_` to `GGML_`…
Smartappli Jul 31, 2024
ac02174
fix(docker): Fix GGML_CUDA param (#1633)
Smartappli Jul 31, 2024
8a12c9f
fix(docker): Update Dockerfile BLAS options (#1632)
Smartappli Jul 31, 2024
1f0b9a2
fix : Missing LoRA adapter after API change (#1630)
shamitv Jul 31, 2024
f7b9e6d
chore: Bump version
abetlen Jul 31, 2024
5575fed
fix: llama_grammar_accept_token arg order (#1649)
tc-wolf Aug 4, 2024
dff186c
feat: Ported back new grammar changes from C++ to Python implementati…
ExtReMLapin Aug 7, 2024
18f58fe
feat: Update llama.cpp
abetlen Aug 7, 2024
ce6466f
chore: Bump version
abetlen Aug 7, 2024
198f47d
feat(ci): Re-build wheel index automatically when releases are created
abetlen Aug 7, 2024
a07b337
feat: Update llama.cpp
abetlen Aug 7, 2024
9cad571
fix: Include all llama.cpp source files and subdirectories
abetlen Aug 7, 2024
8432116
chore: Bump version
abetlen Aug 7, 2024
e966f3b
feat: Add more detailed log for prefix-match (#1659)
xu-song Aug 7, 2024
131db40
chore(deps): bump pypa/cibuildwheel from 2.19.2 to 2.20.0 (#1657)
dependabot[bot] Aug 7, 2024
5e39a85
feat: Enable recursive search of HFFS.ls when using `from_pretrained`…
benHeid Aug 7, 2024
c5de5d3
feat: Update llama.cpp
abetlen Aug 8, 2024
bfb42b7
Merge branch 'main' of github.com:abetlen/llama-cpp-python into main
abetlen Aug 8, 2024
0998ea0
fix: grammar prints on each call. Closes #1666
abetlen Aug 8, 2024
7aaf701
fix: typo
abetlen Aug 8, 2024
45de9d5
feat: Update llama.cpp
abetlen Aug 10, 2024
4244151
feat: Update llama.cpp
abetlen Aug 12, 2024
95a1533
fix: Added back from_file method to LlamaGrammar (#1673)
ExtReMLapin Aug 12, 2024
9bab46f
fix: only print 'cache saved' in verbose mode (#1668)
lsorber Aug 12, 2024
8ed663b
feat: Update llama.cpp
abetlen Aug 12, 2024
fc19cc7
chore: Bump version
abetlen Aug 13, 2024
63d65ac
feat: Update llama.cpp
abetlen Aug 15, 2024
78e35c4
fix: missing dependencies for test (#1680)
jkawamoto Aug 15, 2024
3c7501b
fix: Llama.close didn't free lora adapter (#1679)
jkawamoto Aug 15, 2024
7bf07ec
feat: Update llama.cpp
abetlen Aug 16, 2024
658b244
Merge branch 'main' of github.com:abetlen/llama-cpp-python into main
abetlen Aug 16, 2024
a2ba731
feat: Update llama.cpp
abetlen Aug 19, 2024
d7328ef
chore: Bump version
abetlen Aug 19, 2024
a20f13f
feat: Update llama.cpp
abetlen Aug 21, 2024
259ee15
feat: Update llama.cpp
abetlen Aug 22, 2024
82ae7f9
feat: Update llama.cpp
abetlen Aug 28, 2024
f70df82
feat: Add MiniCPMv26 chat handler.
abetlen Aug 29, 2024
e251a0b
fix: Update name to MiniCPMv26ChatHandler
abetlen Aug 29, 2024
c68e7fb
fix: pull all gh releases for self-hosted python index
abetlen Aug 29, 2024
97d527e
feat: Add server chat_format minicpm-v-2.6 for MiniCPMv26ChatHandler
abetlen Aug 29, 2024
b570fd3
docs: Add project icon courtesy of πŸ€—
abetlen Aug 29, 2024
cbbfad4
docs: center icon and resize
abetlen Aug 29, 2024
ad2deaf
docs: Add MiniCPM-V-2.6 to multi-modal model list
abetlen Aug 29, 2024
332720d
feat: Update llama.cpp
abetlen Aug 29, 2024
077ecb6
chore: Bump version
abetlen Aug 29, 2024
45001ac
misc(fix): Update CHANGELOG
abetlen Aug 29, 2024
4b1e364
docs: Update README
abetlen Aug 29, 2024
8b853c0
docs: Update README
abetlen Aug 29, 2024
9cba3b8
docs: Update README
abetlen Aug 29, 2024
d981d32
feat: Enable detokenizing special tokens with `special=True` (#1596)
benniekiss Aug 29, 2024
98eb092
fix: Use system message in og qwen format. Closes #1697
abetlen Aug 30, 2024
dcb0d0c
feat: Update llama.cpp
abetlen Aug 30, 2024
9769e57
feat: Update llama.cpp
abetlen Aug 31, 2024
c3fc80a
feat: Update llama.cpp
abetlen Sep 2, 2024
9497bcd
feat: Update llama.cpp
abetlen Sep 5, 2024
c032fc6
feat: Update llama.cpp
abetlen Sep 6, 2024
e529940
feat(ci): Speed up CI workflows using `uv`, add support for CUDA 12.5…
Smartappli Sep 18, 2024
a4e1451
chore(deps): bump pypa/cibuildwheel from 2.20.0 to 2.21.1 (#1743)
dependabot[bot] Sep 18, 2024
f8fcb3e
feat: Update sampling API for llama.cpp (#1742)
abetlen Sep 19, 2024
1e64664
feat: Update llama.cpp
abetlen Sep 19, 2024
9b64bb5
misc: Format
abetlen Sep 19, 2024
22cedad
fix: Fix memory allocation of ndarray (#1704)
xu-song Sep 19, 2024
29afcfd
fix: Don't store scores internally unless logits_all=True. Reduces me…
abetlen Sep 19, 2024
84c0920
feat: Add loading sharded GGUF files from HuggingFace with Llama.from…
Gnurro Sep 19, 2024
47d7a62
feat: Update llama.cpp
abetlen Sep 20, 2024
6c44a3f
feat: Add option to configure n_ubatch
abetlen Sep 20, 2024
49b1e73
docs: Add cuda 12.5 to README.md (#1750)
Smartappli Sep 20, 2024
1324c0c
chore(deps): bump actions/cache from 3 to 4 (#1751)
dependabot[bot] Sep 20, 2024
4744551
feat: Update llama.cpp
abetlen Sep 22, 2024
926b414
feat: Update llama.cpp
abetlen Sep 25, 2024
b3dfb42
chore: Bump version
abetlen Sep 25, 2024
8e07db0
fix: install build dependency
abetlen Sep 25, 2024
65222bc
fix: install build dependency
abetlen Sep 25, 2024
9992c50
fix: Fix speculative decoding
abetlen Sep 26, 2024
11d9562
misc: Rename all_text to remaining_text (#1658)
xu-song Sep 26, 2024
e975dab
fix: Additional fixes for speculative decoding
abetlen Sep 26, 2024
dca0c9a
feat: Update llama.cpp
abetlen Sep 26, 2024
01c7607
feat: Expose libggml in internal APIs (#1761)
abetlen Sep 26, 2024
57e70bb
feat: Update llama.cpp
abetlen Sep 29, 2024
7c4aead
chore: Bump version
abetlen Sep 29, 2024
7403e00
feat: Update llama.cpp
abetlen Oct 22, 2024
e712cff
feat: Update llama.cpp
abetlen Oct 31, 2024
cafa33e
feat: Update llama.cpp
abetlen Nov 15, 2024
d1cb50b
Add missing ggml dependency
abetlen Nov 16, 2024
2796f4e
Add all missing ggml dependencies
abetlen Nov 16, 2024
7ecdd94
chore: Bump version
abetlen Nov 16, 2024
f3fb90b
feat: Update llama.cpp
abetlen Nov 28, 2024
7ba257e
feat: Update llama.cpp
abetlen Dec 6, 2024
9d06e36
fix(ci): Explicitly install arm64 python version
abetlen Dec 6, 2024
fb0b8fe
fix(ci): Explicitly set cmake osx architecture
abetlen Dec 6, 2024
72ed7b8
fix(ci): Explicitly test on arm64 macos runner
abetlen Dec 6, 2024
8988aaf
fix(ci): Use macos-14 runner
abetlen Dec 6, 2024
f11a781
fix(ci): Use macos-13 runner
abetlen Dec 6, 2024
9a09fc7
fix(ci): Debug print python system architecture
abetlen Dec 6, 2024
a412ba5
fix(ci): Update config
abetlen Dec 6, 2024
df05096
fix(ci): Install with regular pip
abetlen Dec 6, 2024
1cd3f2c
fix(ci): gg
abetlen Dec 6, 2024
b34f200
fix(ci): Use python3
abetlen Dec 6, 2024
d8cc231
fix(ci): Use default architecture chosen by action
abetlen Dec 6, 2024
d5d5099
fix(ci): Update CMakeLists.txt for macos
abetlen Dec 6, 2024
4f17ae5
fix(ci): Remove cuda version 12.5.0 incompatibility with VS (#1838)
pabl-o-ce Dec 6, 2024
991d9cd
fix(ci): Remove CUDA 12.5 from index
abetlen Dec 6, 2024
2795303
chore(deps): bump pypa/cibuildwheel from 2.21.1 to 2.22.0 (#1844)
dependabot[bot] Dec 6, 2024
2523472
fix: Fix pickling of Llama class by setting seed from _seed member. C…
abetlen Dec 6, 2024
d553a54
Merge branch 'main' of github.com:abetlen/llama-cpp-python into main
abetlen Dec 6, 2024
ddac04c
chore(deps): bump conda-incubator/setup-miniconda from 3.0.4 to 3.1.0…
dependabot[bot] Dec 6, 2024
fa04cdc
fix logit-bias type hint (#1802)
ddh0 Dec 6, 2024
38fbd29
docs: Remove ref to llama_eval in llama_cpp.py docs (#1819)
richdougherty Dec 6, 2024
4192210
fix: make content not required in ChatCompletionRequestAssistantMessa…
feloy Dec 6, 2024
77a12a3
fix: Re-add suport for CUDA 12.5, add CUDA 12.6 (#1775)
Smartappli Dec 6, 2024
073b7e4
fix: added missing exit_stack.close() to /v1/chat/completions (#1796)
Ian321 Dec 6, 2024
9bd0c95
fix: Avoid thread starvation on many concurrent requests by making us…
gjpower Dec 6, 2024
1ea6154
fix(docs): Update development instructions (#1833)
Florents-Tselai Dec 6, 2024
d610477
fix(examples): Refactor Batching notebook to use new sampler chain AP…
lukestanley Dec 6, 2024
4f0ec65
fix: chat API logprobs format (#1788)
domdomegg Dec 6, 2024
df136cb
misc: Update development Makefile
abetlen Dec 6, 2024
6889429
Merge branch 'main' of github.com:abetlen/llama-cpp-python into main
abetlen Dec 6, 2024
b9b50e5
misc: Update run server command
abetlen Dec 6, 2024
5585f8a
feat: Update llama.cpp
abetlen Dec 9, 2024
61508c2
Add CUDA 12.5 and 12.6 to generated output wheels
abetlen Dec 9, 2024
a9fe0f8
chore: Bump version
abetlen Dec 9, 2024
ca80802
fix(ci): hotfix for wheels
abetlen Dec 9, 2024
002f583
chore: Bump version
abetlen Dec 9, 2024
ea4d86a
fix(ci): update macos runner image to non-deprecated version
abetlen Dec 9, 2024
afedfc8
fix: add missing await statements for async exit_stack handling (#1858)
gjpower Dec 9, 2024
801a73a
feat: Update llama.cpp
abetlen Dec 9, 2024
803924b
chore: Bump version
abetlen Dec 9, 2024
2bc1d97
feat: Update llama.cpp
abetlen Dec 19, 2024
c9dfad4
feat: Update llama.cpp
abetlen Dec 30, 2024
1d5f534
feat: Update llama.cpp
abetlen Jan 8, 2025
e8f14ce
fix: streaming resource lock (#1879)
gjpower Jan 8, 2025
0580cf2
chore: Bump version
abetlen Jan 8, 2025
80be68a
feat: Update llama.cpp
abetlen Jan 29, 2025
0b89fe4
feat: Update llama.cpp
abetlen Jan 29, 2025
14879c7
fix(ci): Fix the CUDA workflow (#1894)
oobabooga Jan 29, 2025
4442ff8
fix: error showing time spent in llama perf context print (#1898)
shakalaca Jan 29, 2025
710e19a
chore: Bump version
abetlen Jan 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,12 @@ updates:
- package-ecosystem: "pip" # See documentation for possible values
directory: "/" # Location of package manifests
schedule:
interval: "weekly"
interval: "daily"
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "daily"
- package-ecosystem: "docker"
directory: "/"
schedule:
interval: "daily"
104 changes: 84 additions & 20 deletions .github/workflows/build-and-release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,70 +11,134 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-20.04, windows-2019, macos-11]
os: [ubuntu-20.04, windows-2019, macos-13]

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
submodules: "recursive"

# Used to host cibuildwheel
- uses: actions/setup-python@v3
- uses: actions/setup-python@v5
with:
python-version: "3.8"
python-version: "3.9"

- name: Install dependencies
- name: Install dependencies (Linux/MacOS)
if: runner.os != 'Windows'
run: |
python -m pip install --upgrade pip
python -m pip install -e .[all]
python -m pip install uv
RUST_LOG=trace python -m uv pip install -e .[all] --verbose
shell: bash

- name: Install dependencies (Windows)
if: runner.os == 'Windows'
env:
RUST_LOG: trace
run: |
python -m pip install --upgrade pip
python -m pip install uv
python -m uv pip install -e .[all] --verbose
shell: cmd

- name: Build wheels
uses: pypa/cibuildwheel@v2.16.5
uses: pypa/cibuildwheel@v2.22.0
env:
# disable repair
CIBW_REPAIR_WHEEL_COMMAND: ""
with:
package-dir: .
output-dir: wheelhouse

- uses: actions/upload-artifact@v3
- uses: actions/upload-artifact@v4
with:
name: wheels-${{ matrix.os }}
path: ./wheelhouse/*.whl

build_wheels_arm64:
name: Build arm64 wheels
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: "recursive"

- name: Set up QEMU
uses: docker/setup-qemu-action@v3
with:
platforms: linux/arm64

- name: Build wheels
uses: pypa/[email protected]
env:
CIBW_SKIP: "*musllinux* pp*"
CIBW_REPAIR_WHEEL_COMMAND: ""
CIBW_ARCHS: "aarch64"
CIBW_BUILD: "cp38-* cp39-* cp310-* cp311-* cp312-*"
with:
output-dir: wheelhouse

- name: Upload wheels as artifacts
uses: actions/upload-artifact@v4
with:
name: wheels_arm64
path: ./wheelhouse/*.whl

build_sdist:
name: Build source distribution
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
submodules: "recursive"
- uses: actions/setup-python@v3

- uses: actions/setup-python@v5
with:
python-version: "3.8"
- name: Install dependencies
python-version: "3.9"

- name: Install dependencies (Linux/MacOS)
if: runner.os != 'Windows'
run: |
python -m pip install --upgrade pip build
python -m pip install -e .[all]
python -m pip install --upgrade pip
python -m pip install uv
RUST_LOG=trace python -m uv pip install -e .[all] --verbose
python -m uv pip install build
shell: bash

- name: Install dependencies (Windows)
if: runner.os == 'Windows'
env:
RUST_LOG: trace
run: |
python -m pip install --upgrade pip
python -m pip install uv
python -m uv pip install -e .[all] --verbose
python -m uv pip install build
shell: cmd

- name: Build source distribution
run: |
python -m build --sdist
- uses: actions/upload-artifact@v3

- uses: actions/upload-artifact@v4
with:
name: sdist
path: ./dist/*.tar.gz

release:
name: Release
needs: [build_wheels, build_sdist]
needs: [build_wheels, build_wheels_arm64, build_sdist]
runs-on: ubuntu-latest

steps:
- uses: actions/download-artifact@v3
- uses: actions/download-artifact@v4
with:
name: artifact
merge-multiple: true
path: dist
- uses: softprops/action-gh-release@v1

- uses: softprops/action-gh-release@v2
with:
files: dist/*
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
10 changes: 5 additions & 5 deletions .github/workflows/build-docker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,26 +12,26 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
submodules: "recursive"

- name: Set up QEMU
uses: docker/setup-qemu-action@v2
uses: docker/setup-qemu-action@v3

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v3

- name: Login to GitHub Container Registry
uses: docker/login-action@v2
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Build and push
id: docker_build
uses: docker/build-push-action@v4
uses: docker/build-push-action@v6
with:
context: .
file: "docker/simple/Dockerfile"
Expand Down
51 changes: 28 additions & 23 deletions .github/workflows/build-wheels-cuda.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ jobs:
id: set-matrix
run: |
$matrix = @{
'os' = @('ubuntu-20.04', 'windows-latest')
'pyver' = @("3.10", "3.11", "3.12")
'cuda' = @("12.1.1", "12.2.2", "12.3.2")
'os' = @('ubuntu-latest', 'windows-2019')
'pyver' = @("3.9", "3.10", "3.11", "3.12")
'cuda' = @("12.1.1", "12.2.2", "12.3.2", "12.4.1") #, "12.5.1", "12.6.1")
'releasetag' = @("basic")
}

Expand All @@ -43,29 +43,34 @@ jobs:
AVXVER: ${{ matrix.releasetag }}

steps:
- name: Add MSBuild to PATH
if: runner.os == 'Windows'
uses: microsoft/setup-msbuild@v2
with:
vs-version: '[16.11,16.12)'

- uses: actions/checkout@v4
with:
submodules: "recursive"

- uses: actions/setup-python@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.pyver }}
cache: 'pip'

- name: Setup Mamba
uses: conda-incubator/setup-miniconda@v2.2.0
uses: conda-incubator/setup-miniconda@v3.1.0
with:
activate-environment: "build"
activate-environment: "llamacpp"
python-version: ${{ matrix.pyver }}
miniforge-variant: Mambaforge
miniforge-version: latest
use-mamba: true
add-pip-as-python-dependency: true
auto-activate-base: false

- name: VS Integration Cache
id: vs-integration-cache
if: runner.os == 'Windows'
uses: actions/cache@v3.3.2
uses: actions/cache@v4
with:
path: ./MSBuildExtensions
key: cuda-${{ matrix.cuda }}-vs-integration
Expand All @@ -74,7 +79,7 @@ jobs:
if: runner.os == 'Windows' && steps.vs-integration-cache.outputs.cache-hit != 'true'
run: |
if ($env:CUDAVER -eq '12.1.1') {$x = '12.1.0'} else {$x = $env:CUDAVER}
$links = (Invoke-RestMethod 'https://github.com/Jimver/cuda-toolkit/raw/dc0ca7bb29c5a92f7a963d3d5c93f8d59765136a/src/links/windows-links.ts').Trim().split().where({$_ -ne ''})
$links = (Invoke-RestMethod 'https://raw.githubusercontent.com/Jimver/cuda-toolkit/master/src/links/windows-links.ts').Trim().split().where({$_ -ne ''})
for ($i=$q=0;$i -lt $links.count -and $q -lt 2;$i++) {if ($links[$i] -eq "'$x',") {$q++}}
Invoke-RestMethod $links[$i].Trim("'") -OutFile 'cudainstaller.zip'
& 'C:\Program Files\7-Zip\7z.exe' e cudainstaller.zip -oMSBuildExtensions -r *\MSBuildExtensions\* > $null
Expand All @@ -84,7 +89,7 @@ jobs:
if: runner.os == 'Windows'
run: |
$y = (gi '.\MSBuildExtensions').fullname + '\*'
(gi 'C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\*\BuildCustomizations').fullname.foreach({cp $y $_})
(gi 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Microsoft\VC\*\BuildCustomizations').fullname.foreach({cp $y $_})
$cupath = 'CUDA_PATH_V' + $env:CUDAVER.Remove($env:CUDAVER.LastIndexOf('.')).Replace('.','_')
echo "$cupath=$env:CONDA_PREFIX" >> $env:GITHUB_ENV

Expand All @@ -107,22 +112,22 @@ jobs:
$env:LD_LIBRARY_PATH = $env:CONDA_PREFIX + '/lib:' + $env:LD_LIBRARY_PATH
}
$env:VERBOSE = '1'
$env:CMAKE_ARGS = '-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all'
$env:CMAKE_ARGS = "-DLLAMA_CUDA_FORCE_MMQ=ON $env:CMAKE_ARGS"
if ($env:AVXVER -eq 'AVX') {
$env:CMAKE_ARGS = $env:CMAKE_ARGS + ' -DLLAMA_AVX2=off -DLLAMA_FMA=off -DLLAMA_F16C=off'
}
if ($env:AVXVER -eq 'AVX512') {
$env:CMAKE_ARGS = $env:CMAKE_ARGS + ' -DLLAMA_AVX512=on'
}
if ($env:AVXVER -eq 'basic') {
$env:CMAKE_ARGS = $env:CMAKE_ARGS + ' -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_FMA=off -DLLAMA_F16C=off'
}
$env:CMAKE_ARGS = '-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=all'
$env:CMAKE_ARGS = "-DGGML_CUDA_FORCE_MMQ=ON $env:CMAKE_ARGS"
# if ($env:AVXVER -eq 'AVX') {
$env:CMAKE_ARGS = $env:CMAKE_ARGS + ' -DGGML_AVX2=off -DGGML_FMA=off -DGGML_F16C=off'
# }
# if ($env:AVXVER -eq 'AVX512') {
# $env:CMAKE_ARGS = $env:CMAKE_ARGS + ' -DGGML_AVX512=on'
# }
# if ($env:AVXVER -eq 'basic') {
# $env:CMAKE_ARGS = $env:CMAKE_ARGS + ' -DGGML_AVX=off -DGGML_AVX2=off -DGGML_FMA=off -DGGML_F16C=off'
# }
python -m build --wheel
# write the build tag to the output
Write-Output "CUDA_VERSION=$cudaVersion" >> $env:GITHUB_ENV

- uses: softprops/action-gh-release@v1
- uses: softprops/action-gh-release@v2
with:
files: dist/*
# Set tag_name to <tag>-cu<cuda_version>
Expand Down
Loading
Loading