Thanks to visit codestin.com
Credit goes to github.com

ggml : move CPU backend to a separate file #10144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

slaren merged 3 commits into master from sl/ggml-cpu-backend

Nov 3, 2024

Member

slaren commented Nov 2, 2024 •

edited

Loading

Moves the ggml code specific to the CPU backend to a separate file.

This is an initial step to separate the core ggml library from the CPU backend. In the future, this will allow:

Building other backends as a shared library, without having to link them to the CPU backend
Building the core ggml library with only the base instruction set for the ABI, and load an optimized version of the CPU backend dynamically

Additionally:

Removes the optimization interface, since it has dependencies to the CPU backend, and would be removed in ggml: new optimization interface ggml#988 regardless
Removes the baby-llama example since it depends on the opt interface

github-actions bot added testing examples ggml labels

slaren force-pushed the sl/ggml-cpu-backend branch 6 times, most recently from 8515cb9 to a73ca12 Compare

November 3, 2024 00:00


          ggml : move CPU backend to a separate file

bf95fff

ggml-ci

slaren force-pushed the sl/ggml-cpu-backend branch from a73ca12 to bf95fff Compare

November 3, 2024 00:30

JohannesGaessler reviewed

View reviewed changes

Collaborator

JohannesGaessler left a comment

Are there also plans to split ggml-cpu.c into multiple smaller files like was done for CUDA?

(I did not really look at ggml.c and ggml-cpu.c since I think it's not feasible.)

common/common.cpp

               void yaml_dump_non_result_info(FILE * stream, const common_params & params, const llama_context * lctx,
                                              const std::string & timestamp, const std::vector<int> & prompt_tokens, const char * model_desc) {
+                  ggml_cpu_init(); // some ARM features are detected at runtime

Collaborator

JohannesGaessler Nov 3, 2024

I didn't get around to it, but this PR reminds me that I also want to at some point remove the YAML log code again. It has become pretty outdated and nowadays there are better solutions for the things that I was originally using it for.

ggml/include/ggml.h Show resolved Hide resolved

ggml/src/ggml-rpc.cpp Show resolved Hide resolved

Member Author

slaren commented Nov 3, 2024

Are there also plans to split ggml-cpu.c into multiple smaller files like was done for CUDA?

Yes, I think that would be great. We should also adapt it to C++ and use templates to avoid duplicating the code of the operations for each type.

JohannesGaessler approved these changes

View reviewed changes

Member

ggerganov commented Nov 3, 2024

Looking into this now.

ggerganov self-requested a review

November 3, 2024 13:32

Member

ggerganov commented Nov 3, 2024

Isn't this going to produce thread sanitizer data race warnings on the is_first_call var?

https://github.com/ggerganov/llama.cpp/blob/bf95fffc6fa7a257c43aeb7b6ff47d78af9c9225/ggml/src/ggml.c#L1424-L1443

slaren added 2 commits

November 3, 2024 16:33


          restore use of GGML_PRINT_DEBUG in ggml-cpu.c

673f95b


          revert synchronization change to ggml_init

0825ba2

ggerganov approved these changes

View reviewed changes

slaren merged commit 9f40989 into master

54 checks passed

slaren deleted the sl/ggml-cpu-backend branch

November 3, 2024 18:34

zhiyuan1i pushed a commit to zhiyuan1i/llama.cpp that referenced this pull request


          ggml : move CPU backend to a separate file (ggml-org#10144)

89812b1

FanShupei mentioned this pull request

Bug: Recent llama.cpp breaks q4_0_4_4 on Arm CPU #10165

Closed

snadampal added a commit to snadampal/llama.cpp that referenced this pull request


          fix build break on arm64 linux

04b464e

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

snadampal mentioned this pull request

fix build break on arm64 linux #10166

Merged

4 tasks

Collaborator

chaxu01 commented Nov 4, 2024 •

edited

Loading

@slaren this commit 9f40989 breaks q4_0_4_8 on Arm CPUs, likely related to #10165.

The following command triggers the issue:
./bin/llama-cli -m llama-2-7b-chat.Q4_0_4_8.gguf -p "Write a code in C for bubble sorting" -n 32 -t 4 -ngl 0

The error output is:
Assertion failed: (!isnan(wp[i])Assertion failed: (!isnan(wp[i])Assertion failed: (!isnan(wp[i])Assertion failed: (!isnan(wp[i])), function ggml_compute_forward), function ggml_compute_forward), function ggml_compute_forward_soft_max_f32, file ggml-cpu.c, _soft_max_f32, file ggml-cpu.c, _soft_max_f32, file ggml-cpu.c, ), function ggml_compute_forwardline 8904.

This issue does not occur on commit 08828a6.

slaren pushed a commit that referenced this pull request


          fix build break on arm64 linux (#10166)

6a066b9

This fixes the build break from the recent changes
to move the CPU backend to separate files
#10144

ggerganov pushed a commit to ggml-org/ggml that referenced this pull request


          fix build break on arm64 linux (llama/10166)

f53388d

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144

ggerganov pushed a commit to ggml-org/ggml that referenced this pull request


          fix build break on arm64 linux (llama/10166)

4a372f3

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144

slaren mentioned this pull request

ggml : do not abort when ggml_aligned_malloc fails #10130

Closed

ggerganov mentioned this pull request

ggml : refactor ggml-cpu.c into multiple C++ source files #10180

Open

QingtaoLi1 commented Nov 6, 2024 •

edited

Loading

@slaren Why is the variable is_first_call in ggml_init() in ggml.c set reversely, i.e. is false when first called, while true later?

ggerganov mentioned this pull request

ggml : adjust is_first_call init value #10193

Merged

apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request


          Temp (#15)

c9f3add

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <[email protected]>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request


          Master1 (#17)

91a01ce

* Merge PR (#10) (#11) (#13)

Merge

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory

Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.31.0 to 2.32.2
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <[email protected]>

* Temp (#15)

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <[email protected]>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

ggerganov pushed a commit to ggml-org/whisper.cpp that referenced this pull request


          fix build break on arm64 linux (llama/10166)

03b75f4

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144

apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request


          merge (#20)

98a70f0

* Master1 (#17)

* Merge PR (#10) (#11) (#13)

Merge

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory

Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.31.0 to 2.32.2
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <[email protected]>

* Temp (#15)

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <[email protected]>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Rename build.yml to build-ci.yml

* build.yml

* Update build-ci.yml

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Delete ggml/src/vulkan-shaders/CMakeLists.txt

* Update build.yml

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

ggerganov pushed a commit to ggml-org/whisper.cpp that referenced this pull request


          fix build break on arm64 linux (llama/10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144

arthw pushed a commit to arthw/llama.cpp that referenced this pull request


          ggml : move CPU backend to a separate file (ggml-org#10144)

612599f

arthw pushed a commit to arthw/llama.cpp that referenced this pull request


          fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request


          Merge (#21)

c0b609c

* merge (#20)

* Master1 (#17)

* Merge PR (#10) (#11) (#13)

Merge

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory

Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.31.0 to 2.32.2
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <[email protected]>

* Temp (#15)

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <[email protected]>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Rename build.yml to build-ci.yml

* build.yml

* Update build-ci.yml

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Delete ggml/src/vulkan-shaders/CMakeLists.txt

* Update build.yml

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

arthw pushed a commit to arthw/llama.cpp that referenced this pull request


          ggml : move CPU backend to a separate file (ggml-org#10144)

303f773

arthw pushed a commit to arthw/llama.cpp that referenced this pull request


          fix build break on arm64 linux (ggml-org#10166)

e4a831f

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

adutilleul pushed a commit to adutilleul/whisper.cpp that referenced this pull request


          fix build break on arm64 linux (llama/10166)

fd710d9

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144

lyapple2008 pushed a commit to lyapple2008/ggml_mars that referenced this pull request


          fix build break on arm64 linux (llama/10166)

11b442c

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144

apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request


          Temp (#23)

21aa12f

* Merge (#21)

* merge (#20)

* Master1 (#17)

* Merge PR (#10) (#11) (#13)

Merge

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory

Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.31.0 to 2.32.2
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <[email protected]>

* Temp (#15)

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <[email protected]>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Rename build.yml to build-ci.yml

* build.yml

* Update build-ci.yml

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Delete ggml/src/vulkan-shaders/CMakeLists.txt

* Update build.yml

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

apicalshark added a commit to apicalshark/llama.cpp that referenced this pull request


          a (#28)

d217eb7

* Temp (#23)

* Merge (#21)

* merge (#20)

* Master1 (#17)

* Merge PR (#10) (#11) (#13)

Merge

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump requests from 2.31.0 to 2.32.2 in the pip group across 1 directory

Bumps the pip group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.31.0 to 2.32.2
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <[email protected]>

* Temp (#15)

* metal : fix minor string leaks (ggml/1004)

* cmake : make it possible linking ggml as external lib (ggml/1003)

* sync : ggml

* CANN: adjust backend registry refactor. (ggml-org#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

* metal : move dequantize templates to beginning of MSL source (#0)

* metal : simplify f16 and f32 dequant kernels (#0)

* cuda : clear error after changing peer access (ggml-org#10153)

* fix build break on arm64 linux (ggml-org#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144

* server : clarify /slots endpoint, add is_processing (ggml-org#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)

* ggml : fix gelu tables initialization (ggml-org#10172)

* Q6_K AVX improvements (ggml-org#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)

* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <[email protected]>

* metal : add quantized FA support (ggml-org#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

* ggml : adjust is_first_call init value (ggml-org#10193)

ggml-ci

* metal : fix from ptr buffer name (ggml-org#10189)

* server : remove hack for extra parallel slot (ggml-org#10187)

ggml-ci

* metal : add BF16 support (ggml-org#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Rename build.yml to build-ci.yml

* build.yml

* Update build-ci.yml

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Delete ggml/src/vulkan-shaders/CMakeLists.txt

* Update build.yml

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Update build-ci.yml

* Update build-ci.yml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

* Bump the pip group across 2 directories with 2 updates (#24)

Updates the requirements on [pillow](https://github.com/python-pillow/Pillow) and [aiohttp](https://github.com/aio-libs/aiohttp) to permit the latest version.

Updates `pillow` to 11.0.0
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](python-pillow/Pillow@10.2.0...11.0.0)

Updates `aiohttp` to 3.11.7
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](aio-libs/aiohttp@v3.9.3...v3.11.7)

---
updated-dependencies:
- dependency-name: pillow
  dependency-type: direct:production
  dependency-group: pip
- dependency-name: aiohttp
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: apicalshark <[email protected]>

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Update build-ci.yml

* Create docker.yml

* Create python-lint.yml

* Create server.yml

* Update requirements.txt

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: dennyxbox890 <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>

github-actions bot pushed a commit to martin-steinegger/ProstT5-llama that referenced this pull request


          fix build break on arm64 linux (#10166)

6a2abf0

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144

lyapple2008 pushed a commit to lyapple2008/whisper.cpp.mars that referenced this pull request


          fix build break on arm64 linux (llama/10166)

330cf8b

This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org/llama.cpp#10144

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples ggml testing