Thanks to visit codestin.com
Credit goes to github.com

Skip to content

graph : fix geglu #14077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 9, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 8 additions & 16 deletions src/llama-graph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -663,22 +663,14 @@ ggml_tensor * llm_graph_context::build_ffn(
{
// Split into two equal parts
int64_t split_point = cur->ne[0] / 2;
ggml_tensor * output_ffn_up = ggml_cont(ctx0, ggml_view_2d(
ctx0, cur, split_point,
cur->ne[1], cur->nb[1], 0
));
ggml_tensor * output_ffn_gate = ggml_cont(ctx0, ggml_view_2d(
ctx0, cur, split_point,
cur->ne[1], cur->nb[1],
split_point * ggml_element_size(cur)
));

// Apply GELU activation function to the first part
output_ffn_up = ggml_gelu(ctx0, output_ffn_up);
cb(output_ffn_up, "ffn_gelu", il);

// Element-wise multiplication between the activated part and the gate part
cur = ggml_mul(ctx0, output_ffn_up, output_ffn_gate);
// TODO: these conts should not be needed
ggml_tensor * x0 = ggml_cont(ctx0, ggml_view_2d(ctx0, cur, split_point, cur->ne[1], cur->nb[1], 0));
ggml_tensor * x1 = ggml_cont(ctx0, ggml_view_2d(ctx0, cur, split_point, cur->ne[1], cur->nb[1], split_point * ggml_element_size(cur)));
Comment on lines +666 to +668
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this operation earlier and have it for both SWIGLU/GEGLU to keep things DRY?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though either way is OK. If we repeat more than 2 times, then we can DRY it.

I would first look into removing the conts btw.

Copy link
Collaborator

@CISC CISC Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems safe to me (removing ggml_cont), at least it works fine here, what would be the scenario when it's not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is correct, I just think the conts should not be necessary and introduce overhead.


x0 = ggml_gelu(ctx0, x0);
cb(x0, "ffn_gelu", il);

cur = ggml_mul(ctx0, x0, x1);
cb(cur, "ffn_geglu", il);
} break;
}
Expand Down
Loading