metal : fuse add, mul #14596

ggerganov · 2025-07-09T14:06:45Z

Fuse GGML_OP_ADD and GGML_OP_MUL

LLAMA_SET_ROWS=1 ./scripts/compare-commits.sh master gg/metal-fuse-add -m ./models/qwen3-30b-a3b/ggml-model-q8_0.gguf -m models/gemma-3-4b/ggml-model-q8_0.gguf -fa 1 -t 1

Model	Test	t/s master	t/s gg/metal-fuse-add	Speedup
gemma3 4B Q8_0	pp512	2444.84	2494.63	1.02
gemma3 4B Q8_0	tg128	90.39	96.76	1.07
qwen3moe 30B.A3B Q8_0	pp512	1362.92	1420.74	1.04
qwen3moe 30B.A3B Q8_0	tg128	70.12	76.68	1.09

Testing

make -j && GGML_METAL_FUSION_DEBUG=2 ./bin/test-backend-ops -o RMS_NORM_MUL_ADD -b Metal

Backend 1/3: Metal
  Device description: Apple M4 Max
  Device memory: 28753 MB (28747 MB free)

ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.000000): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.000001): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.000100): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.100000): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=1.000000): OK
  6543/6543 tests passed
  Backend Metal: OK
ggml_backend_metal_device_rel: fused ADD: 5
ggml_backend_metal_device_rel: fused MUL: 5

Disable with env variable
Print fuse stats
Fuse with norms, cpys, etc.
Cleaner kernel impl?

ggml/src/ggml-metal/ggml-metal.m

ggml-ci

* Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <[email protected]> * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Code review * Whitespace * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <[email protected]> * This is actually sigmoid, duh. * Add CONST, remove TRI_KEEP, other changes from review * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <[email protected]> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <[email protected]> * Update ggml/src/ggml-cuda/unary.cu Co-authored-by: Aman Gupta <[email protected]> * Remove extra script * Update ggml/src/ggml.c Co-authored-by: Diego Devesa <[email protected]> * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <[email protected]> * moving changes from laptop [no ci] * pre-rebase * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Refactor tests * ggml : cleanup * cont : fix ggml_fill srcs * tests : add note * ggml : add ggml_fill_inplace * ggml : add asserts * ggml : fix ggml_fill constant cast * cont : ggml_tri minor * Use TENSOR_LOCALS * Fix regression from #14596, regenerate * Don't make commits at night... --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Diego Devesa <[email protected]> Co-authored-by: Aman Gupta <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>

slaren reviewed Jul 9, 2025

View reviewed changes

ggml/src/ggml-metal/ggml-metal.m Outdated Show resolved Hide resolved

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jul 9, 2025

ggerganov force-pushed the gg/metal-fuse-add branch 2 times, most recently from 23bc8a3 to b61796c Compare July 11, 2025 11:05

ggerganov changed the base branch from master to gg/graph-context-refactor July 11, 2025 11:05

ggerganov force-pushed the gg/graph-context-refactor branch 2 times, most recently from 5a220cc to bc0a20c Compare July 12, 2025 19:51

ggerganov force-pushed the gg/metal-fuse-add branch 3 times, most recently from 6e07c3e to 067d04a Compare July 13, 2025 19:11

github-actions bot added the testing Everything test related label Jul 13, 2025

ggerganov marked this pull request as ready for review July 14, 2025 10:28

ggerganov force-pushed the gg/metal-fuse-add branch from fc3a162 to 474041f Compare July 14, 2025 10:35

ggerganov changed the title ~~metal : fuse add~~ metal : fuse add, mul Jul 14, 2025

ggerganov force-pushed the gg/graph-context-refactor branch 2 times, most recently from 20010c4 to ae2fb57 Compare July 18, 2025 05:00

Base automatically changed from gg/graph-context-refactor to master July 18, 2025 05:29

ggerganov force-pushed the gg/metal-fuse-add branch 2 times, most recently from 012fb71 to 04d0349 Compare July 18, 2025 11:39

metal : fuse add, mul + add tests

effa72e

ggml-ci

ggerganov force-pushed the gg/metal-fuse-add branch from 04d0349 to effa72e Compare July 18, 2025 11:46

ggerganov merged commit bf9087f into master Jul 18, 2025
53 of 55 checks passed

ggerganov deleted the gg/metal-fuse-add branch July 18, 2025 17:37

ggerganov mentioned this pull request Jul 25, 2025

server : fix vision test regex #14871

Closed

ggerganov mentioned this pull request Aug 10, 2025

CANN: Add fused FFN op #15209

Closed

This was referenced Sep 11, 2025

llama.cpp需求 cosdt/llama.cpp#28

Closed

llama.cpp需求 noemotiovon/llama.cpp#1

Open

slaren mentioned this pull request Nov 12, 2025

Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM #17063

Merged

pwilkin added a commit to pwilkin/llama.cpp that referenced this pull request Nov 12, 2025

Fix regression from ggml-org#14596, regenerate

09ef180

pwilkin added a commit to pwilkin/llama.cpp that referenced this pull request Nov 13, 2025

Fix regression from ggml-org#14596, regenerate

60858f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal : fuse add, mul #14596

metal : fuse add, mul #14596

Uh oh!

ggerganov commented Jul 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

metal : fuse add, mul #14596

metal : fuse add, mul #14596

Uh oh!

Conversation

ggerganov commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ggerganov commented Jul 9, 2025 •

edited

Loading