metal : use less stack memory in FA kernel #14088

ggerganov · 2025-06-09T14:16:05Z

Accumulated into shared memory vs into registers.

ggml-ci

* origin/master: llama : support GEGLU for jina-bert-v2 (ggml-org#14090) vulkan: force device 0 in CI (ggml-org#14106) Fixed spec timings to: accepted/tested instead of accepted/drafted (ggml-org#14104) sync : ggml ggml : fix weak alias win32 (whisper/0) Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (ggml-org#14099) rpc : nicer error messages for RPC server crash (ggml-org#14076) sync : ggml Add in-build ggml::ggml ALIAS library (ggml/1260) metal : use less stack memory in FA kernel (ggml-org#14088) kv-cache : fix shift and defrag logic (ggml-org#14081) llama : allow building all tests on windows when not using shared libs (ggml-org#13980)

metal : use less stack memory in FA kernel

705592c

ggml-ci

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jun 9, 2025

ggerganov mentioned this pull request Jun 9, 2025

Eval bug: Compute function exceeds available stack space #14055

Closed

cont : fix BF16 variant

aae3f04

ggerganov merged commit 1f63e75 into master Jun 9, 2025
38 of 43 checks passed

ggerganov deleted the gg/metal-fa-acc-f32-2 branch June 9, 2025 20:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal : use less stack memory in FA kernel #14088

metal : use less stack memory in FA kernel #14088

ggerganov commented Jun 9, 2025

Uh oh!

Uh oh!

Uh oh!

metal : use less stack memory in FA kernel #14088

metal : use less stack memory in FA kernel #14088

Conversation

ggerganov commented Jun 9, 2025

Uh oh!

Uh oh!

Uh oh!