Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fix: heap corruption in OneDNN batch matmul with broadcast rank mismatch#120199

Open
Ashutosh0x wants to merge 1 commit into
tensorflow:masterfrom
Ashutosh0x:fix/mkl-matmul-rank4-corruption
Open

fix: heap corruption in OneDNN batch matmul with broadcast rank mismatch#120199
Ashutosh0x wants to merge 1 commit into
tensorflow:masterfrom
Ashutosh0x:fix/mkl-matmul-rank4-corruption

Conversation

@Ashutosh0x
Copy link
Copy Markdown

Summary

Fix heap corruption in OneDNN batch matmul when broadcasting tensors with mismatched ranks (e.g., rank-4 × rank-2).

Vulnerability (#117700)

tf.linalg.matmul with a rank-4 tensor [13, 1, 3, 4] and rank-2 tensor [4, 16] causes heap corruption when OneDNN is enabled:

\
corrupted double-linked list
free(): invalid pointer
Segmentation fault (core dumped)
\\

Root Cause

When broadcasting rank-2 → rank-4, ExpandInputDimsToOutputShape expands [4, 16] to [1, 1, 4, 16]. Then CalculateTFStrides computes strides [64, 64, 16, 1].

But the actual tensor only has 64 elements. OneDNN uses these strides to access memory at offsets like batch_idx × 64, which reads far beyond the tensor buffer → heap corruption.

Reproduction

\\python
import os
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '1'
import tensorflow as tf
tf.random.set_seed(280958)
p = tf.random.normal([13, 1, 3, 4])
w = tf.random.normal([4, 16])
out = tf.linalg.matmul(p, w) # CRASH: corrupted double-linked list
\\

Fix

Set strides to 0 for prepended broadcast dimensions so OneDNN re-reads the same data for each batch index instead of advancing the pointer past the buffer. This is the standard OneDNN convention for broadcast dimensions.

File Changed

  • tensorflow/core/kernels/mkl/mkl_batch_matmul_helper.h

Fixes #117700

@Ashutosh0x Ashutosh0x requested a review from penpornk as a code owner June 3, 2026 10:01
@google-ml-butler google-ml-butler Bot added awaiting review Pull request awaiting review size:S CL Change Size: Small labels Jun 3, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies mkl_batch_matmul_helper.h to support broadcasting by allowing a zero dimension offset and setting the strides of prepended dimensions to zero, preventing heap corruption in OneDNN. The review feedback recommends adding unit tests to cover this broadcasting scenario and suggests using std::max when calculating dimension offsets to prevent potential underflow or out-of-bounds access.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +77 to +82
// When a tensor has been expanded to match the output rank via
// ExpandInputDimsToOutputShape, the prepended dimensions have size 1 and
// are broadcast. Their strides must be set to 0 so that OneDNN re-reads
// the same data for each batch index instead of advancing the pointer
// beyond the actual tensor buffer (which causes heap corruption).
// See #117700.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to the TensorFlow PR Review Guidelines (Repository Style Guide, lines 57-61), unit tests are mandatory for any new logic, feature, or bug fix. Please add a unit test (for example, in tensorflow/python/kernel_tests/linalg/matmul_op_test.py or a relevant C++ test file) that reproduces the rank mismatch broadcasting scenario (e.g., rank-4 × rank-2) with OneDNN enabled to prevent regressions.

References
  1. Mandatory test coverage for any new logic, feature, or bug fix. (link)

Comment on lines +83 to +90
auto dim_offset_lhs = ndims_out - ndims_lhs;
for (int i = 0; i < dim_offset_lhs; ++i) {
lhs_strides[i] = 0;
}
auto dim_offset_rhs = ndims_out - ndims_rhs;
for (int i = 0; i < dim_offset_rhs; ++i) {
rhs_strides[i] = 0;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To prevent potential unsigned underflow or unexpected cases where ndims_out < ndims_lhs (which could lead to out-of-bounds access if dim_offset_lhs wraps around or becomes negative), use std::max(0, ...) to safely bound the loop limit.

Suggested change
auto dim_offset_lhs = ndims_out - ndims_lhs;
for (int i = 0; i < dim_offset_lhs; ++i) {
lhs_strides[i] = 0;
}
auto dim_offset_rhs = ndims_out - ndims_rhs;
for (int i = 0; i < dim_offset_rhs; ++i) {
rhs_strides[i] = 0;
}
int dim_offset_lhs = std::max(0, ndims_out - ndims_lhs);
for (int i = 0; i < dim_offset_lhs; ++i) {
lhs_strides[i] = 0;
}
int dim_offset_rhs = std::max(0, ndims_out - ndims_rhs);
for (int i = 0; i < dim_offset_rhs; ++i) {
rhs_strides[i] = 0;
}

When tf.linalg.matmul broadcasts a rank-2 tensor against a rank-4
tensor (e.g., [13,1,3,4] x [4,16]), ExpandInputDimsToOutputShape
expands the rank-2 shape to [1,1,4,16]. CalculateTFStrides then
computes strides [64,64,16,1] for the expanded shape.

However, the actual tensor only has 64 elements. OneDNN uses the
expanded strides to access memory, reading at offsets like
batch_idx * 64 which goes far beyond the tensor buffer, causing:
- corrupted double-linked list
- free(): invalid pointer
- Segmentation fault

The fix: set strides to 0 for prepended broadcast dimensions so
OneDNN re-reads the same data for each batch index instead of
advancing the pointer past the buffer. Uses std::max(0, ...) to
safely bound the offset to prevent potential underflow.

Also fixes DCHECK(dim_offset > 0) which should be >= 0.

Fixes tensorflow#117700
@Ashutosh0x Ashutosh0x force-pushed the fix/mkl-matmul-rank4-corruption branch from e4da47b to 29e4932 Compare June 3, 2026 10:04
@gemini-code-assist
Copy link
Copy Markdown

Thank you for the update. The use of std::max(0, ...) correctly handles the potential underflow, and the logic for setting strides to 0 for broadcast dimensions is appropriate for the OneDNN requirements. Including the reproduction script as a regression test is a good practice to ensure this issue does not recur.

@Ashutosh0x
Copy link
Copy Markdown
Author

Hi @penpornk 👋 — could you take a look at this when you get a chance?

This fixes a heap corruption crash in the OneDNN batch matmul path when broadcasting tensors with mismatched ranks (e.g., rank-4 × rank-2). The root cause is that CalculateTFStrides computes non-zero strides for prepended broadcast dimensions, causing OneDNN to read past the tensor buffer.

The fix is small and localized — just zeroing the strides for broadcast dimensions, which is the standard OneDNN convention. The issue reporter confirmed the crash is reliably reproducible with a simple tf.linalg.matmul call.

Happy to address any additional feedback. Thanks!

@keerthanakadiri keerthanakadiri added comp:core issues related to core part of tensorflow prtype:bugfix PR to fix a bug labels Jun 3, 2026
@github-project-automation github-project-automation Bot moved this to Assigned Reviewer in PR Queue Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review Pull request awaiting review comp:core issues related to core part of tensorflow prtype:bugfix PR to fix a bug size:S CL Change Size: Small

Projects

Status: Assigned Reviewer

Development

Successfully merging this pull request may close these issues.

Memory Corruption in matmul with rank-4 input tensor and ONEDNN enabled

3 participants