-
Notifications
You must be signed in to change notification settings - Fork 24.1k
[Don't merge]Upgrade submodule oneDNN to v3.7 (#147498)(Zi) #147917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147917
Note: Links to docs will display an error until the docs builds have been completed. ❌ 13 New Failures, 1 Unrelated FailureAs of commit 85ed6ad with merge base ab81ca5 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
This PR is to upgrade submodule oneDNN to v3.7. ## Improvements - Improved performance of convolution and matmul primitives on Intel Xeon processors with Intel AMX instruction set support (formerly Sapphire Rapids and Granite Rapids). - Improved performance of int8 and fp32 forward convolution primitive on processors with Intel AVX2 instruction set support. - Improved performance of fp8 matmul primitives with bf16 and fp16 bias data type on Intel Xeon processors with Intel AMX instruction set support (formerly Sapphire Rapids and Granite Rapids). - Introduced initial optimizations for Intel GPUs based on Xe3 architecture. - Added bfloat16 support for SDPA, implemented fp16 and bf16 gemm kernel in SDPA. - Fixed f16 matmul accuracy, the issue of SDPA cannot dispatched to ukernel, bf16/fp16/fp32 conv performance, INT8 Kernel trigger page fault, deconvolution precision issue on complex128 and fp64 and gemm correctness issue in float16 issues. - Improved bf16 matmul performance with fp32 destination with Arm Compute Library (ACL). - Improved bf16 to fp32 reorder performance. - Improved bf16 reorder performance. - Improved bf16 convolution with ACL. Fixes pytorch#136348. ## Validation results on CPU 1. NLP models accuracy/inference/training   2. Torchbench cpu userbenchmark inference & training  3. Inductor quantization  4. Dynamo benchmarks         ## Validation results on XPU Accuracy is same as baseline. Performance is shown below.  ## Validation results on ARM   Pull Request resolved: pytorch#147498 Approved by: https://github.com/fadara01, https://github.com/mingfeima, https://github.com/atalman
Successfully rebased |
df5447b
to
85ed6ad
Compare
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
This PR is to upgrade submodule oneDNN to v3.7.
Improvements
Fixes #136348.
Validation results on CPU
Validation results on XPU
Accuracy is same as baseline. Performance is shown below.
Validation results on ARM
Pull Request resolved: #147498
Approved by: https://github.com/fadara01, https://github.com/mingfeima, https://github.com/atalman
Fixes #ISSUE_NUMBER
cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @jgong5 @mingfeima @sanchitintel @ashokei @jingxu10 @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @snadampal @voznesenskym @penguinwu @EikanWang @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov