Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Jul 9, 2025

target #14629

Fuse GGML_OP_ADD and GGML_OP_MUL

LLAMA_SET_ROWS=1 ./scripts/compare-commits.sh master gg/metal-fuse-add -m ./models/qwen3-30b-a3b/ggml-model-q8_0.gguf -m models/gemma-3-4b/ggml-model-q8_0.gguf -fa 1 -t 1
Model Test t/s master t/s gg/metal-fuse-add Speedup
gemma3 4B Q8_0 pp512 2444.84 2494.63 1.02
gemma3 4B Q8_0 tg128 90.39 96.76 1.07
qwen3moe 30B.A3B Q8_0 pp512 1362.92 1420.74 1.04
qwen3moe 30B.A3B Q8_0 tg128 70.12 76.68 1.09

Testing

make -j && GGML_METAL_FUSION_DEBUG=2 ./bin/test-backend-ops -o RMS_NORM_MUL_ADD -b Metal
Backend 1/3: Metal
  Device description: Apple M4 Max
  Device memory: 28753 MB (28747 MB free)

ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.000000): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.000001): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.000100): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.100000): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=1.000000): OK
  6543/6543 tests passed
  Backend Metal: OK
ggml_backend_metal_device_rel: fused ADD: 5
ggml_backend_metal_device_rel: fused MUL: 5

  • Disable with env variable
  • Print fuse stats
  • Fuse with norms, cpys, etc.
  • Cleaner kernel impl?

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jul 9, 2025
@ggerganov ggerganov force-pushed the gg/metal-fuse-add branch 2 times, most recently from 23bc8a3 to b61796c Compare July 11, 2025 11:05
@ggerganov ggerganov changed the base branch from master to gg/graph-context-refactor July 11, 2025 11:05
@ggerganov ggerganov force-pushed the gg/graph-context-refactor branch 2 times, most recently from 5a220cc to bc0a20c Compare July 12, 2025 19:51
@ggerganov ggerganov force-pushed the gg/metal-fuse-add branch 3 times, most recently from 6e07c3e to 067d04a Compare July 13, 2025 19:11
@github-actions github-actions bot added the testing Everything test related label Jul 13, 2025
@ggerganov ggerganov marked this pull request as ready for review July 14, 2025 10:28
@ggerganov ggerganov force-pushed the gg/metal-fuse-add branch from fc3a162 to 474041f Compare July 14, 2025 10:35
@ggerganov ggerganov changed the title metal : fuse add metal : fuse add, mul Jul 14, 2025
@ggerganov ggerganov force-pushed the gg/graph-context-refactor branch 2 times, most recently from 20010c4 to ae2fb57 Compare July 18, 2025 05:00
Base automatically changed from gg/graph-context-refactor to master July 18, 2025 05:29
@ggerganov ggerganov force-pushed the gg/metal-fuse-add branch 2 times, most recently from 012fb71 to 04d0349 Compare July 18, 2025 11:39
@ggerganov ggerganov force-pushed the gg/metal-fuse-add branch from 04d0349 to effa72e Compare July 18, 2025 11:46
@ggerganov ggerganov merged commit bf9087f into master Jul 18, 2025
53 of 55 checks passed
@ggerganov ggerganov deleted the gg/metal-fuse-add branch July 18, 2025 17:37
pwilkin added a commit to pwilkin/llama.cpp that referenced this pull request Nov 12, 2025
pwilkin added a commit to pwilkin/llama.cpp that referenced this pull request Nov 13, 2025
ggerganov added a commit that referenced this pull request Nov 13, 2025
* Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM

* Update ggml/include/ggml.h

Co-authored-by: Georgi Gerganov <[email protected]>

* Update tests/test-backend-ops.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Code review

* Whitespace

* Update tests/test-backend-ops.cpp

Co-authored-by: Diego Devesa <[email protected]>

* This is actually sigmoid, duh.

* Add CONST, remove TRI_KEEP, other changes from review

* Update tests/test-backend-ops.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Update ggml/src/ggml.c

Co-authored-by: Georgi Gerganov <[email protected]>

* Update ggml/src/ggml.c

Co-authored-by: Georgi Gerganov <[email protected]>

* Update ggml/src/ggml-cuda/unary.cu

Co-authored-by: Aman Gupta <[email protected]>

* Remove extra script

* Update ggml/src/ggml.c

Co-authored-by: Diego Devesa <[email protected]>

* Update tests/test-backend-ops.cpp

Co-authored-by: Diego Devesa <[email protected]>

* moving changes from laptop [no ci]

* pre-rebase

* Update tests/test-backend-ops.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update tests/test-backend-ops.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Refactor tests

* ggml : cleanup

* cont : fix ggml_fill srcs

* tests : add note

* ggml : add ggml_fill_inplace

* ggml : add asserts

* ggml : fix ggml_fill constant cast

* cont : ggml_tri minor

* Use TENSOR_LOCALS

* Fix regression from #14596, regenerate

* Don't make commits at night...

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: Aman Gupta <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants