Thanks to visit codestin.com
Credit goes to github.com

Skip to content

sycl: Remove not needed copy f16->f32 for dnnl mul mat #14125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 12, 2025

Conversation

ShanoToni
Copy link
Contributor

@ShanoToni ShanoToni commented Jun 11, 2025

PR proposes when GGML_SYCL_F16=ON to allow oneDNN to handle conversion and outputting of mul_mat into f32 and enabling fpmathmode to f16.

The current approach uses the memory pool to pass a f16 dst for the oneDNN matmul and a cpy from f16 to the actual f32 output dst_dd_i
Example of performance difference observed:

Lunar Lake

current approach:

model size params backend ngl threads test t/s
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 8 pp512 1475.42 ± 43.91
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 8 tg128 37.46 ± 0.38

proposed changes:

model size params backend ngl threads test t/s
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 8 pp512 1566.90 ± 26.33
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 8 tg128 37.66 ± 0.49

Battlemage

current approach:

model size params backend ngl test t/s
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 pp512 7424.59 ± 15.58
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 tg128 99.61 ± 2.09

proposed changes:

model size params backend ngl test t/s
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 pp512 8148.99 ± 35.10
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 tg128 100.62 ± 2.05

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jun 11, 2025
@Rbiessy
Copy link
Collaborator

Rbiessy commented Jun 12, 2025

CI failure is unrelated so merging now

@Rbiessy Rbiessy merged commit ed52f36 into ggml-org:master Jun 12, 2025
121 of 129 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants