fix: heap corruption in OneDNN batch matmul with broadcast rank mismatch#120199
fix: heap corruption in OneDNN batch matmul with broadcast rank mismatch#120199Ashutosh0x wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request modifies mkl_batch_matmul_helper.h to support broadcasting by allowing a zero dimension offset and setting the strides of prepended dimensions to zero, preventing heap corruption in OneDNN. The review feedback recommends adding unit tests to cover this broadcasting scenario and suggests using std::max when calculating dimension offsets to prevent potential underflow or out-of-bounds access.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| // When a tensor has been expanded to match the output rank via | ||
| // ExpandInputDimsToOutputShape, the prepended dimensions have size 1 and | ||
| // are broadcast. Their strides must be set to 0 so that OneDNN re-reads | ||
| // the same data for each batch index instead of advancing the pointer | ||
| // beyond the actual tensor buffer (which causes heap corruption). | ||
| // See #117700. |
There was a problem hiding this comment.
According to the TensorFlow PR Review Guidelines (Repository Style Guide, lines 57-61), unit tests are mandatory for any new logic, feature, or bug fix. Please add a unit test (for example, in tensorflow/python/kernel_tests/linalg/matmul_op_test.py or a relevant C++ test file) that reproduces the rank mismatch broadcasting scenario (e.g., rank-4 × rank-2) with OneDNN enabled to prevent regressions.
References
- Mandatory test coverage for any new logic, feature, or bug fix. (link)
| auto dim_offset_lhs = ndims_out - ndims_lhs; | ||
| for (int i = 0; i < dim_offset_lhs; ++i) { | ||
| lhs_strides[i] = 0; | ||
| } | ||
| auto dim_offset_rhs = ndims_out - ndims_rhs; | ||
| for (int i = 0; i < dim_offset_rhs; ++i) { | ||
| rhs_strides[i] = 0; | ||
| } |
There was a problem hiding this comment.
To prevent potential unsigned underflow or unexpected cases where ndims_out < ndims_lhs (which could lead to out-of-bounds access if dim_offset_lhs wraps around or becomes negative), use std::max(0, ...) to safely bound the loop limit.
| auto dim_offset_lhs = ndims_out - ndims_lhs; | |
| for (int i = 0; i < dim_offset_lhs; ++i) { | |
| lhs_strides[i] = 0; | |
| } | |
| auto dim_offset_rhs = ndims_out - ndims_rhs; | |
| for (int i = 0; i < dim_offset_rhs; ++i) { | |
| rhs_strides[i] = 0; | |
| } | |
| int dim_offset_lhs = std::max(0, ndims_out - ndims_lhs); | |
| for (int i = 0; i < dim_offset_lhs; ++i) { | |
| lhs_strides[i] = 0; | |
| } | |
| int dim_offset_rhs = std::max(0, ndims_out - ndims_rhs); | |
| for (int i = 0; i < dim_offset_rhs; ++i) { | |
| rhs_strides[i] = 0; | |
| } |
When tf.linalg.matmul broadcasts a rank-2 tensor against a rank-4 tensor (e.g., [13,1,3,4] x [4,16]), ExpandInputDimsToOutputShape expands the rank-2 shape to [1,1,4,16]. CalculateTFStrides then computes strides [64,64,16,1] for the expanded shape. However, the actual tensor only has 64 elements. OneDNN uses the expanded strides to access memory, reading at offsets like batch_idx * 64 which goes far beyond the tensor buffer, causing: - corrupted double-linked list - free(): invalid pointer - Segmentation fault The fix: set strides to 0 for prepended broadcast dimensions so OneDNN re-reads the same data for each batch index instead of advancing the pointer past the buffer. Uses std::max(0, ...) to safely bound the offset to prevent potential underflow. Also fixes DCHECK(dim_offset > 0) which should be >= 0. Fixes tensorflow#117700
e4da47b to
29e4932
Compare
|
Thank you for the update. The use of |
|
Hi @penpornk 👋 — could you take a look at this when you get a chance? This fixes a heap corruption crash in the OneDNN batch matmul path when broadcasting tensors with mismatched ranks (e.g., rank-4 × rank-2). The root cause is that The fix is small and localized — just zeroing the strides for broadcast dimensions, which is the standard OneDNN convention. The issue reporter confirmed the crash is reliably reproducible with a simple Happy to address any additional feedback. Thanks! |
Summary
Fix heap corruption in OneDNN batch matmul when broadcasting tensors with mismatched ranks (e.g., rank-4 × rank-2).
Vulnerability (#117700)
tf.linalg.matmulwith a rank-4 tensor[13, 1, 3, 4]and rank-2 tensor[4, 16]causes heap corruption when OneDNN is enabled:\
corrupted double-linked list
free(): invalid pointer
Segmentation fault (core dumped)
\\
Root Cause
When broadcasting rank-2 → rank-4,
ExpandInputDimsToOutputShapeexpands[4, 16]to[1, 1, 4, 16]. ThenCalculateTFStridescomputes strides[64, 64, 16, 1].But the actual tensor only has 64 elements. OneDNN uses these strides to access memory at offsets like
batch_idx × 64, which reads far beyond the tensor buffer → heap corruption.Reproduction
\\python
import os
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '1'
import tensorflow as tf
tf.random.set_seed(280958)
p = tf.random.normal([13, 1, 3, 4])
w = tf.random.normal([4, 16])
out = tf.linalg.matmul(p, w) # CRASH: corrupted double-linked list
\\
Fix
Set strides to 0 for prepended broadcast dimensions so OneDNN re-reads the same data for each batch index instead of advancing the pointer past the buffer. This is the standard OneDNN convention for broadcast dimensions.
File Changed
tensorflow/core/kernels/mkl/mkl_batch_matmul_helper.hFixes #117700