Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ggml : move CPU backend to a separate file #10144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 3, 2024
Merged

Conversation

slaren
Copy link
Member

@slaren slaren commented Nov 2, 2024

Moves the ggml code specific to the CPU backend to a separate file.

This is an initial step to separate the core ggml library from the CPU backend. In the future, this will allow:

  • Building other backends as a shared library, without having to link them to the CPU backend
  • Building the core ggml library with only the base instruction set for the ABI, and load an optimized version of the CPU backend dynamically

Additionally:

  • Removes the optimization interface, since it has dependencies to the CPU backend, and would be removed in ggml: new optimization interface ggml#988 regardless
  • Removes the baby-llama example since it depends on the opt interface

@github-actions github-actions bot added testing Everything test related examples ggml changes relating to the ggml tensor library for machine learning labels Nov 2, 2024
@slaren slaren force-pushed the sl/ggml-cpu-backend branch 6 times, most recently from 8515cb9 to a73ca12 Compare November 3, 2024 00:00
@slaren slaren force-pushed the sl/ggml-cpu-backend branch from a73ca12 to bf95fff Compare November 3, 2024 00:30
Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there also plans to split ggml-cpu.c into multiple smaller files like was done for CUDA?

(I did not really look at ggml.c and ggml-cpu.c since I think it's not feasible.)

@@ -1951,6 +1951,8 @@ void yaml_dump_string_multiline(FILE * stream, const char * prop_name, const cha

void yaml_dump_non_result_info(FILE * stream, const common_params & params, const llama_context * lctx,
const std::string & timestamp, const std::vector<int> & prompt_tokens, const char * model_desc) {
ggml_cpu_init(); // some ARM features are detected at runtime
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get around to it, but this PR reminds me that I also want to at some point remove the YAML log code again. It has become pretty outdated and nowadays there are better solutions for the things that I was originally using it for.

@slaren
Copy link
Member Author

slaren commented Nov 3, 2024

Are there also plans to split ggml-cpu.c into multiple smaller files like was done for CUDA?

Yes, I think that would be great. We should also adapt it to C++ and use templates to avoid duplicating the code of the operations for each type.

@ggerganov
Copy link
Member

Looking into this now.

@ggerganov ggerganov self-requested a review November 3, 2024 13:32
@ggerganov
Copy link
Member

Isn't this going to produce thread sanitizer data race warnings on the is_first_call var?

https://github.com/ggerganov/llama.cpp/blob/bf95fffc6fa7a257c43aeb7b6ff47d78af9c9225/ggml/src/ggml.c#L1424-L1443

@slaren slaren merged commit 9f40989 into master Nov 3, 2024
54 checks passed
@slaren slaren deleted the sl/ggml-cpu-backend branch November 3, 2024 18:34
zhiyuan1i pushed a commit to zhiyuan1i/llama.cpp that referenced this pull request Nov 4, 2024
snadampal added a commit to snadampal/llama.cpp that referenced this pull request Nov 4, 2024
This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144
@snadampal snadampal mentioned this pull request Nov 4, 2024
4 tasks
@chaxu01
Copy link
Collaborator

chaxu01 commented Nov 4, 2024

@slaren this commit 9f40989 breaks q4_0_4_8 on Arm CPUs, likely related to #10165.

The following command triggers the issue:
./bin/llama-cli -m llama-2-7b-chat.Q4_0_4_8.gguf -p "Write a code in C for bubble sorting" -n 32 -t 4 -ngl 0

The error output is:
Assertion failed: (!isnan(wp[i])Assertion failed: (!isnan(wp[i])Assertion failed: (!isnan(wp[i])Assertion failed: (!isnan(wp[i])), function ggml_compute_forward), function ggml_compute_forward), function ggml_compute_forward_soft_max_f32, file ggml-cpu.c, _soft_max_f32, file ggml-cpu.c, _soft_max_f32, file ggml-cpu.c, ), function ggml_compute_forwardline 8904.

This issue does not occur on commit 08828a6.

slaren pushed a commit that referenced this pull request Nov 4, 2024
This fixes the build break from the recent changes
to move the CPU backend to separate files
#10144
ggerganov pushed a commit to ggml-org/ggml that referenced this pull request Nov 4, 2024
This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144
ggerganov pushed a commit to ggml-org/ggml that referenced this pull request Nov 4, 2024
This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144
@QingtaoLi1
Copy link

QingtaoLi1 commented Nov 6, 2024

@slaren Why is the variable is_first_call in ggml_init() in ggml.c set reversely, i.e. is false when first called, while true later?

apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request Nov 7, 2024
* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <[email protected]>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>
apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request Nov 8, 2024
* Merge PR (#10) (#11) (#13)

Merge

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory

Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.31.0 to 2.32.2
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <[email protected]>

* Temp (#15)

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <[email protected]>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>
ggerganov pushed a commit to ggml-org/whisper.cpp that referenced this pull request Nov 15, 2024
This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144
apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request Nov 15, 2024
* Master1 (#17)

* Merge PR (#10) (#11) (#13)

Merge

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory

Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.31.0 to 2.32.2
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <[email protected]>

* Temp (#15)

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <[email protected]>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Rename build.yml to build-ci.yml

* build.yml

* Update build-ci.yml

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Delete ggml/src/vulkan-shaders/CMakeLists.txt

* Update build.yml

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>
ggerganov pushed a commit to ggml-org/whisper.cpp that referenced this pull request Nov 15, 2024
This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144
apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request Nov 16, 2024
* merge (#20)

* Master1 (#17)

* Merge PR (#10) (#11) (#13)

Merge

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory

Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.31.0 to 2.32.2
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <[email protected]>

* Temp (#15)

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <[email protected]>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Rename build.yml to build-ci.yml

* build.yml

* Update build-ci.yml

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Delete ggml/src/vulkan-shaders/CMakeLists.txt

* Update build.yml

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144
adutilleul pushed a commit to adutilleul/whisper.cpp that referenced this pull request Nov 19, 2024
This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144
lyapple2008 pushed a commit to lyapple2008/ggml_mars that referenced this pull request Nov 20, 2024
This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144
apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request Nov 22, 2024
* Merge (#21)

* merge (#20)

* Master1 (#17)

* Merge PR (#10) (#11) (#13)

Merge

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory

Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.31.0 to 2.32.2
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <[email protected]>

* Temp (#15)

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <[email protected]>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Rename build.yml to build-ci.yml

* build.yml

* Update build-ci.yml

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Delete ggml/src/vulkan-shaders/CMakeLists.txt

* Update build.yml

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>
apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request Dec 1, 2024
* Temp (#23)

* Merge (#21)

* merge (#20)

* Master1 (#17)

* Merge PR (#10) (#11) (#13)

Merge

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory

Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.31.0 to 2.32.2
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <[email protected]>

* Temp (#15)

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <[email protected]>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Rename build.yml to build-ci.yml

* build.yml

* Update build-ci.yml

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Delete ggml/src/vulkan-shaders/CMakeLists.txt

* Update build.yml

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Bump the pip group across 2 directories with 2 updates (#24)

Updates the requirements on [pillow](https://github.com/python-pillow/Pillow) and [aiohttp](https://github.com/aio-libs/aiohttp) to permit the latest version.

Updates `pillow` to 11.0.0
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](python-pillow/Pillow@10.2.0...11.0.0)

Updates `aiohttp` to 3.11.7
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](aio-libs/aiohttp@v3.9.3...v3.11.7)

---
updated-dependencies:
- dependency-name: pillow
  dependency-type: direct:production
  dependency-group: pip
- dependency-name: aiohttp
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: apicalshark <[email protected]>

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Create docker.yml

* Create python-lint.yml

* Create server.yml

* Update requirements.txt

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>
github-actions bot pushed a commit to martin-steinegger/ProstT5-llama that referenced this pull request Dec 30, 2024
This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144
lyapple2008 pushed a commit to lyapple2008/whisper.cpp.mars that referenced this pull request Feb 4, 2025
This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples ggml changes relating to the ggml tensor library for machine learning testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants