Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM #17063

pwilkin · 2025-11-06T21:04:55Z

The ops needed for the new hybrid models including Qwen3 Next and Kimi Linear.

Prerequisite to merging #16095

pwilkin · 2025-11-06T21:06:50Z

@gabe-l-hart guess you'll be interested in this one as well :)

gabe-l-hart

A few comments on the op signatures and how they relate to the versions I have on the SSD branch

gabe-l-hart · 2025-11-06T21:08:47Z

docs/ops.md

 | Operation | BLAS | CANN | CPU | CUDA | Metal | OpenCL | SYCL | Vulkan | zDNN |
 |-----------|------|------|------|------|------|------|------|------|------|
 |                              ABS | ❌ | ✅ | ✅ | 🟡 | 🟡 | ❌ | 🟡 | ❌ | ❌ |
 |                              ACC | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |


Whelp, glad I know about this file now

gabe-l-hart · 2025-11-06T21:09:24Z

ggml/include/ggml.h

+            struct ggml_context * ctx,
+            struct ggml_tensor  * a);
+
+    GGML_API struct ggml_tensor * ggml_softplus(


I also have this one on my SSD branch #16982

gabe-l-hart · 2025-11-06T21:10:58Z

ggml/include/ggml.h

            struct ggml_context * ctx,
            struct ggml_tensor  * a);

+    GGML_API struct ggml_tensor * ggml_cumsum(


In an effort to remove permutations and conts, I updated this to also allow an additional dim argument and then added a ggml_cumsum_0 convenience wrapper for dim 0 https://github.com/ggml-org/llama.cpp/pull/16982/files#diff-7dea3e94fe52f756a218321acc77042d0a333fd3d7e4c35160920ce6e86cb400R997

I have no strong opinion, but if we want the dim version, we could bring that in here, or we could consider changing that one to ggml_cumsum_dim on my branch to keep the function signature after this is merged.

To be honest, I just made it to mirror the behavior of ggml_sum exactly. I do agree the PyTorch approach (of specifying the dimension) is better and I think minimizing permutes / conts is good, my perfectionist self just thinks that should entail the refactoring of OP_SUM as well :>

This should be OK. If we need to sum along dim 1, we can transpose the input and extend ggml_cumsum to work with transposed data. As initial pass, assert data is contiguous.

gabe-l-hart · 2025-11-06T21:12:33Z

ggml/include/ggml.h

            int                   shift3);

+    // Make matrix into a triangular one (upper, upper + diagonal, lower or lower + diagonal) with constant value
+    GGML_API struct ggml_tensor * ggml_tri(


Similar to cumsum, I added a version of this with the ability to specify dimensions https://github.com/ggml-org/llama.cpp/pull/16982/files#diff-7dea3e94fe52f756a218321acc77042d0a333fd3d7e4c35160920ce6e86cb400R2219

Here I'd be a bit skeptical. The convention is that the first two dimensions are the matrix dimensions. All the code is written under that assumption. PyTorch also defines .tril() without the dimension parameters. I don't think extending that is worth it tbh and it makes it much harder to write optimized kernels, so I'd drop it, but I guess @ggerganov should have a say.

Yep, totally fair. This was very much "let's see if I can engineer any improvements." I'm definitely not clear if it makes any significant difference in the places I've used it currently.

ggml/src/ggml-cpu/vec.h

ggml/include/ggml.h

ggerganov · 2025-11-07T13:15:09Z

ggml/include/ggml.h

            struct ggml_context * ctx,
            struct ggml_tensor  * a);

+    GGML_API struct ggml_tensor * ggml_cumsum(


This should be OK. If we need to sum along dim 1, we can transpose the input and extend ggml_cumsum to work with transposed data. As initial pass, assert data is contiguous.

ggml/src/ggml.c

tests/test-backend-ops.cpp

pwilkin · 2025-11-08T19:21:07Z

@slaren @ggerganov Should be ready for final review.

ggml/include/ggml.h

ggml/src/ggml-cpu/ops.cpp

ggml/include/ggml.h

ggml/src/ggml.c

pwilkin · 2025-11-11T00:38:58Z

@ggerganov Aight, paralellized CUMSUM, added docs, removed TRI_KEEP, renamed TRI_KEEP to TRI, added CONST with const1234d helpers.

ggml/src/ggml-cpu/vec.h

ggml/src/ggml.c

ggml/include/ggml.h

tests/test-backend-ops.cpp

prebld-debug.sh

ggml/include/ggml.h

docs/ops.md

ggml/src/ggml-cuda/unary.cu

ggml/src/ggml.c

tests/test-backend-ops.cpp

pwilkin · 2025-11-11T21:46:06Z

Aight, @ggerganov @slaren @CISC it's ready to merge I think.

tests/test-backend-ops.cpp

ggerganov

@pwilkin Thanks for the contribution.

As a constructive feedback for the future, try to split the changes in ggml in even smaller parts. It would improve the review process because there are many little details (naming, API design, code formatting) that are not obvious at first and it takes some time to get accustomed to them.

ggerganov · 2025-11-12T12:07:43Z

tests/test-backend-ops.cpp

+    void initialize_tensors(ggml_context * ctx) override {
+        for (ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) {
+            if (strcmp(t->name, "a") == 0) {
+                init_tensor_tril(t, 0.1, 1.0f);
+            } else {
+                init_tensor_uniform(t, 0.1, 1.0f);
+            }


Any reason to not include negative numbers here?

Not really other than that we really don't want zeroes, and I don't think there's a way to exclude them.

Unless you mean for the second tensor - I guess changing it to (-1, 1) should be fine there.

Co-authored-by: Diego Devesa <[email protected]>

Co-authored-by: Georgi Gerganov <[email protected]>

Co-authored-by: Aman Gupta <[email protected]>

Co-authored-by: Diego Devesa <[email protected]>

Co-authored-by: Sigbjørn Skjæret <[email protected]>

pwilkin requested review from ggerganov and slaren as code owners November 6, 2025 21:04

github-actions bot added documentation Improvements or additions to documentation testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 6, 2025

gabe-l-hart reviewed Nov 6, 2025

View reviewed changes

DajanaV mentioned this pull request Nov 6, 2025

UPSTREAM PR #17063: Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM auroralabs-loci/llama.cpp#109

Open

ggerganov reviewed Nov 7, 2025

View reviewed changes

ggml/src/ggml-cpu/vec.h Outdated Show resolved Hide resolved

ggerganov reviewed Nov 7, 2025

View reviewed changes

slaren reviewed Nov 8, 2025

View reviewed changes

pwilkin force-pushed the hybrid_ops branch from e1d9c23 to a177276 Compare November 8, 2025 19:03

pwilkin mentioned this pull request Nov 8, 2025

Model: Qwen3 Next #16095

Open

ggerganov reviewed Nov 10, 2025

View reviewed changes

ggml/include/ggml.h Show resolved Hide resolved

ggml/src/ggml-cpu/ops.cpp Outdated Show resolved Hide resolved

ggml/include/ggml.h Outdated Show resolved Hide resolved

ggml/include/ggml.h Outdated Show resolved Hide resolved

ggerganov reviewed Nov 10, 2025

View reviewed changes

ggml/src/ggml.c Outdated Show resolved Hide resolved

ggerganov reviewed Nov 11, 2025

View reviewed changes

CISC reviewed Nov 11, 2025

View reviewed changes

docs/ops.md Show resolved Hide resolved

am17an reviewed Nov 11, 2025

View reviewed changes

ggml/src/ggml-cuda/unary.cu Show resolved Hide resolved

slaren reviewed Nov 11, 2025

View reviewed changes

ggml/src/ggml.c Outdated Show resolved Hide resolved

slaren reviewed Nov 11, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

slaren reviewed Nov 11, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

danbev mentioned this pull request Nov 11, 2025

sampling : add support for GPU sampling (wip) #17004

Draft

25 tasks

pwilkin force-pushed the hybrid_ops branch from 2cf0aa4 to 29dcae2 Compare November 11, 2025 21:43

CISC reviewed Nov 11, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

ggerganov approved these changes Nov 12, 2025

View reviewed changes

ggerganov reviewed Nov 12, 2025

View reviewed changes

pwilkin and others added 27 commits November 13, 2025 12:25

Code review

4b758ae

Whitespace

2152c61

Update tests/test-backend-ops.cpp

ff5f88d

Co-authored-by: Diego Devesa <[email protected]>

This is actually sigmoid, duh.

748aafb

Add CONST, remove TRI_KEEP, other changes from review

08512d7

Update tests/test-backend-ops.cpp

5553fdd

Co-authored-by: Georgi Gerganov <[email protected]>

Update ggml/src/ggml.c

58be26a

Co-authored-by: Georgi Gerganov <[email protected]>

Update ggml/src/ggml.c

5762fe9

Co-authored-by: Georgi Gerganov <[email protected]>

Update ggml/src/ggml-cuda/unary.cu

34e907e

Co-authored-by: Aman Gupta <[email protected]>

Remove extra script

2f271e5

Update ggml/src/ggml.c

3f104b3

Co-authored-by: Diego Devesa <[email protected]>

Update tests/test-backend-ops.cpp

e8fae90

Co-authored-by: Diego Devesa <[email protected]>

moving changes from laptop [no ci]

ea455ed

pre-rebase

609074c

Update tests/test-backend-ops.cpp

fa56f18

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update tests/test-backend-ops.cpp

6b77ed5

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Refactor tests

d63cca6

ggml : cleanup

0578622

cont : fix ggml_fill srcs

f496342

tests : add note

0080156

ggml : add ggml_fill_inplace

1303442

ggml : add asserts

2e8ba58

ggml : fix ggml_fill constant cast

37acf9e

cont : ggml_tri minor

1f6b9b0

Use TENSOR_LOCALS

0451cb5

Fix regression from ggml-org#14596, regenerate

60858f1

Don't make commits at night...

ecc7b6d

pwilkin force-pushed the hybrid_ops branch from f04b04f to ecc7b6d Compare November 13, 2025 11:26

slaren approved these changes Nov 13, 2025

View reviewed changes

ggerganov merged commit 389ac78 into ggml-org:master Nov 13, 2025
67 of 69 checks passed

Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM #17063

Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM #17063

Conversation

pwilkin commented Nov 6, 2025

Uh oh!

pwilkin commented Nov 6, 2025

Uh oh!

gabe-l-hart left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pwilkin commented Nov 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pwilkin commented Nov 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pwilkin commented Nov 11, 2025

Uh oh!

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels