Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tags: orca-zhang/llama.cpp

Tags

b4908

Toggle b4908's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fixed compilation warnings in ggml-sycl (ggml-org#12424)

b4859

Toggle b4859's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llava : fix bug in minicpm-v code (ggml-org#11513)

* fix bug in minicpm-v code

* update readme of minicpm-v

b4778

Toggle b4778's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: fix assertion when qy_needs_dequant (ggml-org#12068)

Looks like a copy/paste bug from qx_needs_dequant.

b4773

Toggle b4773's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server: support add_generation_prompt query param (ggml-org#12062)

b4771

Toggle b4771's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : expose llama_model_n_head_kv in the API (ggml-org#11997)

It's useful to be able to have this from the library layer as it's a key
parameter of the model (e.g. to figure out how much KV cache memory is
needed).

b4770

Toggle b4770's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
metal : copy kernels for quant to F32/F16 conversions (ggml-org#12017)

metal: use dequantize_q templates

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b4769

Toggle b4769's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
opencl: fix for small models (ggml-org#11950)

* opencl: fix small shape gemv, remove unused extensions

* opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size

* opencl: fix for token length < 4

* opencl: use wave size of 64 for all Adreno GPUs

---------

Co-authored-by: Shawn Gu <[email protected]>
Co-authored-by: Skyler Szot <[email protected]>

b4768

Toggle b4768's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llava : Add Granite Vision Support (ggml-org#11794)

* Add super wip scripts for multimodal granite gguf

Signed-off-by: Alex-Brooks <[email protected]>

* Add example for converting mmgranite to gguf

Signed-off-by: Alex-Brooks <[email protected]>

* remove hardcoded path

Signed-off-by: Alex-Brooks <[email protected]>

* Add vision feature layer to gguf params

Signed-off-by: Alex-Brooks <[email protected]>

* Clean up llava surgery and remove name substitution hacks

Signed-off-by: Alex-Brooks <[email protected]>

* Add transformers llava next tensor name mapping

Signed-off-by: Alex-Brooks <[email protected]>

* Make siglip / openclip mutuall exclusive

Signed-off-by: Alex-Brooks <[email protected]>

* Fix projector linear substitution

Signed-off-by: Alex-Brooks <[email protected]>

* Fix linear 2 substitution index

Signed-off-by: Alex-Brooks <[email protected]>

* Increase max flattened gridpoints to 64

Signed-off-by: Alex-Brooks <[email protected]>

* Fix hardcoded concat for multiple feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Pull vision feature layers out of gguf keys

Signed-off-by: Alex-Brooks <[email protected]>

* fix num gridpoints and use all layers

Signed-off-by: Alex-Brooks <[email protected]>

* Avoid dropping last image encoder layer in llava models

Signed-off-by: Alex-Brooks <[email protected]>

* Use 10 for max number of patches

Signed-off-by: Alex-Brooks <[email protected]>

* Standardize vision feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Cleanup logs

Signed-off-by: Alex-Brooks <[email protected]>

* Update comment for vision feature layer init

Signed-off-by: Alex-Brooks <[email protected]>

* Update notes for alternative to legacy llm conversion script

Signed-off-by: Alex-Brooks <[email protected]>

* Fix notes rendering

Signed-off-by: Alex-Brooks <[email protected]>

* Add v prefix to vision feature layer log

Signed-off-by: Alex-Brooks <[email protected]>

* Use current defaults for feature layer

Signed-off-by: Alex-Brooks <[email protected]>

* Use constant for max gridpoints / feat layers, style fixes

Signed-off-by: Alex-Brooks <[email protected]>

* clarify non-negative feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Remove CLIP_API from func signature

Signed-off-by: Alex-Brooks <[email protected]>

* USE MAX_IMAGE_FEATURE_LAYERS const in layer calc

Signed-off-by: Alex-Brooks <[email protected]>

* Clarify feature layers are non negative ints and not uint

Signed-off-by: Alex-Brooks <[email protected]>

* Fix condition for reading feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* pop last llava layer when feature layers are unset

Signed-off-by: Alex-Brooks <[email protected]>

* Fix unset vision layer 0

Signed-off-by: Alex-Brooks <[email protected]>

* Update examples/llava/clip.cpp

Co-authored-by: Xuan-Son Nguyen <[email protected]>

* Reenable assertion for out of bounds get_rows

Signed-off-by: Alex-Brooks <[email protected]>

* Use std vector for gridpoints and feature layers

Signed-off-by: Alex-Brooks <[email protected]>

* Caculate max feature layer at load time

Signed-off-by: Alex-Brooks <[email protected]>

* Include base patch for granite vision allocation

Signed-off-by: Alex-Brooks <[email protected]>

* Fix trailing whitespace

Signed-off-by: Alex-Brooks <[email protected]>

* Add max num patches = 10 back for minicpmv

Signed-off-by: Alex-Brooks <[email protected]>

* Use unordered set to store feature layers

Co-authored-by: Xuan-Son Nguyen <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>

* Use max feature layer for postnorm

Signed-off-by: Alex-Brooks <[email protected]>

* Apply suggestions from code review

---------

Signed-off-by: Alex-Brooks <[email protected]>
Co-authored-by: Xuan-Son Nguyen <[email protected]>

b4767

Toggle b4767's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (ggml-org#12035)

* opt performance by reorder for Intel GPU

* detect hw type and save opt feature, and print opt feature

* correct name

* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed

* add env variable GGML_SYCL_DISABLE_OPT for debug

* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT

* add performance data

* mv getrows functions to separeted files

* fix global variables

---------

Co-authored-by: arthw <[email protected]>

b4765

Toggle b4765's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
SYCL: Fix GGML_SYCL_DEBUG macro (ggml-org#11995)