Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@pwilkin
Copy link
Collaborator

@pwilkin pwilkin commented Nov 6, 2025

The ops needed for the new hybrid models including Qwen3 Next and Kimi Linear.

Prerequisite to merging #16095

@github-actions github-actions bot added documentation Improvements or additions to documentation testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 6, 2025
@pwilkin
Copy link
Collaborator Author

pwilkin commented Nov 6, 2025

@gabe-l-hart guess you'll be interested in this one as well :)

Copy link
Collaborator

@gabe-l-hart gabe-l-hart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments on the op signatures and how they relate to the versions I have on the SSD branch

| Operation | BLAS | CANN | CPU | CUDA | Metal | OpenCL | SYCL | Vulkan | zDNN |
|-----------|------|------|------|------|------|------|------|------|------|
| ABS |||| 🟡 | 🟡 || 🟡 |||
| ACC ||||||||||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whelp, glad I know about this file now

struct ggml_context * ctx,
struct ggml_tensor * a);

GGML_API struct ggml_tensor * ggml_softplus(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have this one on my SSD branch #16982

struct ggml_context * ctx,
struct ggml_tensor * a);

GGML_API struct ggml_tensor * ggml_cumsum(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an effort to remove permutations and conts, I updated this to also allow an additional dim argument and then added a ggml_cumsum_0 convenience wrapper for dim 0 https://github.com/ggml-org/llama.cpp/pull/16982/files#diff-7dea3e94fe52f756a218321acc77042d0a333fd3d7e4c35160920ce6e86cb400R997

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinion, but if we want the dim version, we could bring that in here, or we could consider changing that one to ggml_cumsum_dim on my branch to keep the function signature after this is merged.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I just made it to mirror the behavior of ggml_sum exactly. I do agree the PyTorch approach (of specifying the dimension) is better and I think minimizing permutes / conts is good, my perfectionist self just thinks that should entail the refactoring of OP_SUM as well :>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be OK. If we need to sum along dim 1, we can transpose the input and extend ggml_cumsum to work with transposed data. As initial pass, assert data is contiguous.

int shift3);

// Make matrix into a triangular one (upper, upper + diagonal, lower or lower + diagonal) with constant value
GGML_API struct ggml_tensor * ggml_tri(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to cumsum, I added a version of this with the ability to specify dimensions https://github.com/ggml-org/llama.cpp/pull/16982/files#diff-7dea3e94fe52f756a218321acc77042d0a333fd3d7e4c35160920ce6e86cb400R2219

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'd be a bit skeptical. The convention is that the first two dimensions are the matrix dimensions. All the code is written under that assumption. PyTorch also defines .tril() without the dimension parameters. I don't think extending that is worth it tbh and it makes it much harder to write optimized kernels, so I'd drop it, but I guess @ggerganov should have a say.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, totally fair. This was very much "let's see if I can engineer any improvements." I'm definitely not clear if it makes any significant difference in the places I've used it currently.

struct ggml_context * ctx,
struct ggml_tensor * a);

GGML_API struct ggml_tensor * ggml_cumsum(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be OK. If we need to sum along dim 1, we can transpose the input and extend ggml_cumsum to work with transposed data. As initial pass, assert data is contiguous.

@pwilkin
Copy link
Collaborator Author

pwilkin commented Nov 8, 2025

@slaren @ggerganov Should be ready for final review.

@pwilkin pwilkin mentioned this pull request Nov 8, 2025
@pwilkin
Copy link
Collaborator Author

pwilkin commented Nov 11, 2025

@ggerganov Aight, paralellized CUMSUM, added docs, removed TRI_KEEP, renamed TRI_KEEP to TRI, added CONST with const1234d helpers.

@pwilkin
Copy link
Collaborator Author

pwilkin commented Nov 11, 2025

Aight, @ggerganov @slaren @CISC it's ready to merge I think.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwilkin Thanks for the contribution.

As a constructive feedback for the future, try to split the changes in ggml in even smaller parts. It would improve the review process because there are many little details (naming, API design, code formatting) that are not obvious at first and it takes some time to get accustomed to them.

Comment on lines 6030 to 6038
void initialize_tensors(ggml_context * ctx) override {
for (ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) {
if (strcmp(t->name, "a") == 0) {
init_tensor_tril(t, 0.1, 1.0f);
} else {
init_tensor_uniform(t, 0.1, 1.0f);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not include negative numbers here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really other than that we really don't want zeroes, and I don't think there's a way to exclude them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you mean for the second tensor - I guess changing it to (-1, 1) should be fine there.

pwilkin and others added 27 commits November 13, 2025 12:25
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
@ggerganov ggerganov merged commit 389ac78 into ggml-org:master Nov 13, 2025
67 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants