[pull] main from abetlen:main #61

pull · 2024-04-09T05:54:21Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

…to main

Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.17.0 to 2.18.0. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.17.0...v2.18.0) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Special tokens are already mapped from metadata by llama.cpp

…1333) * implement min_tokens * set default to 0 * pass min_tokens * fix * remove copy * implement MinTokensLogitsProcessor * format * fix condition

…to main

Co-authored-by: Andrei <[email protected]>

updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrei <[email protected]>

…ge (#1807)

Add support for Cuda 12.6.1 Update version of Cuda 12.5.0 to 12.5.1 Co-authored-by: Andrei <[email protected]>

* fix: added missing exit_stack.close() to /v1/chat/completions * fix: added missing exit_stack.close() to /v1/completions

…e of asyncio to lock llama_proxy context (#1798) * fix: make use of asyncio to lock llama_proxy context * fix: use aclose instead of close for AsyncExitStack * fix: don't call exit stack close in stream iterator as it will be called by finally from on_complete anyway * fix: use anyio.Lock instead of asyncio.Lock --------- Co-authored-by: Andrei <[email protected]>

#1793) - Replaced deprecated llama_sample_token with llama_sampler_sample - Updated llama_token_to_piece signature to include lstrip and special arguments - Changed llama_load_model_from_file to use positional argument for params (currently does not accept a keyword argument for it)

* fix: chat API logprobs format * Fix optional properties

* fix: correct issue with handling lock during streaming move locking for streaming into get_event_publisher call so it is locked and unlocked in the correct task for the streaming reponse * fix: simplify exit stack management for create_chat_completion and create_completion * fix: correct missing `async with` and format code * fix: remove unnecessary explicit use of AsyncExitStack fix: correct type hints for body_model --------- Co-authored-by: Andrei <[email protected]>

* feat: Sync with llama.cpp Add `no_perf` field to `llama_context_params` to optionally disable performance timing measurements. * fix: Display performance metrics by default --------- Co-authored-by: Andrei <[email protected]>

pull bot added the ⤵️ pull label Apr 9, 2024

abetlen force-pushed the main branch from 0188482 to c96b2da Compare April 17, 2024 14:06

abetlen and others added 28 commits May 10, 2024 09:44

chore: Bump version

7316502

fix: Enable CUDA backend for llava. Closes #1324

7f59856

docs: Fix typo in README.md (#1444)

1547202

feat: Update llama.cpp

9dc5e20

Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…

3fe8e9a

…to main

chore: Bump version

3c19faa

fix(ci): Use version without extra platform tag in pep503 index

3f8e17a

feat: Update llama.cpp

43ba152

Update llama.cpp

50f5c74

misc: Remove unnecessary metadata lookups (#1448)

389e09c

Special tokens are already mapped from metadata by llama.cpp

feat: add MinTokensLogitProcessor and min_tokens argument to server (#…

5212fb0

…1333) * implement min_tokens * set default to 0 * pass min_tokens * fix * remove copy * implement MinTokensLogitsProcessor * format * fix condition

feat: Update llama.cpp

ca8e3c9

Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…

e811a81

…to main

fix: segfault for models without eos / bos tokens. Closes #1463

d99a6ba

chore: Bump version

b564d05

example: LLM inference with Ray Serve (#1465)

03f171e

feat: Update llama.cpp

d8a3b01

Merge branch 'main' of https://github.com/abetlen/llama-cpp-python in…

3dbfec7

…to main

feat: Update llama.cpp

5a595f0

feat: Update llama.cpp

087cc0b

feat: Improve Llama.eval performance by avoiding list conversion (#1476)

5cae104

Co-authored-by: Andrei <[email protected]>

chore: Bump version

a4c9ab8

docs: Update multi-modal model section

ec43e89

fix(docs): Fix link typo

9e8d7d5

docs: Fix table formatting

2d89964

feat: Update llama.cpp

454c9bb

feloy and others added 29 commits December 6, 2024 04:47

fix: make content not required in ChatCompletionRequestAssistantMessa…

4192210

…ge (#1807)

fix: Re-add suport for CUDA 12.5, add CUDA 12.6 (#1775)

77a12a3

Add support for Cuda 12.6.1 Update version of Cuda 12.5.0 to 12.5.1 Co-authored-by: Andrei <[email protected]>

fix: added missing exit_stack.close() to /v1/chat/completions (#1796)

073b7e4

* fix: added missing exit_stack.close() to /v1/chat/completions * fix: added missing exit_stack.close() to /v1/completions

fix(docs): Update development instructions (#1833)

1ea6154

fix: chat API logprobs format (#1788)

4f0ec65

* fix: chat API logprobs format * Fix optional properties

misc: Update development Makefile

df136cb

Merge branch 'main' of github.com:abetlen/llama-cpp-python into main

6889429

misc: Update run server command

b9b50e5

feat: Update llama.cpp

5585f8a

Add CUDA 12.5 and 12.6 to generated output wheels

61508c2

chore: Bump version

a9fe0f8

fix(ci): hotfix for wheels

ca80802

chore: Bump version

002f583

fix(ci): update macos runner image to non-deprecated version

ea4d86a

fix: add missing await statements for async exit_stack handling (#1858)

afedfc8

feat: Update llama.cpp

801a73a

chore: Bump version

803924b

feat: Update llama.cpp

2bc1d97

feat: Update llama.cpp

c9dfad4

feat: Update llama.cpp

1d5f534

chore: Bump version

0580cf2

feat: Update llama.cpp

80be68a

feat: Update llama.cpp

0b89fe4

fix(ci): Fix the CUDA workflow (#1894)

14879c7

chore: Bump version

710e19a

pull bot merged commit 710e19a into MZWNET:main Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from abetlen:main #61

[pull] main from abetlen:main #61

pull bot commented Apr 9, 2024 •

edited

Loading

[pull] main from abetlen:main #61

[pull] main from abetlen:main #61

Conversation

pull bot commented Apr 9, 2024 • edited Loading

pull bot commented Apr 9, 2024 •

edited

Loading