Codestin Search App

Ashutosh0x · 2026-06-03T10:01:20Z

Summary

Fix heap corruption in OneDNN batch matmul when broadcasting tensors with mismatched ranks (e.g., rank-4 × rank-2).

Vulnerability (#117700)

tf.linalg.matmul with a rank-4 tensor [13, 1, 3, 4] and rank-2 tensor [4, 16] causes heap corruption when OneDNN is enabled:

\
corrupted double-linked list
free(): invalid pointer
Segmentation fault (core dumped)
\\

Root Cause

When broadcasting rank-2 → rank-4, ExpandInputDimsToOutputShape expands [4, 16] to [1, 1, 4, 16]. Then CalculateTFStrides computes strides [64, 64, 16, 1].

But the actual tensor only has 64 elements. OneDNN uses these strides to access memory at offsets like batch_idx × 64, which reads far beyond the tensor buffer → heap corruption.

Reproduction

\\python
import os
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '1'
import tensorflow as tf
tf.random.set_seed(280958)
p = tf.random.normal([13, 1, 3, 4])
w = tf.random.normal([4, 16])
out = tf.linalg.matmul(p, w) # CRASH: corrupted double-linked list
\\

Fix

Set strides to 0 for prepended broadcast dimensions so OneDNN re-reads the same data for each batch index instead of advancing the pointer past the buffer. This is the standard OneDNN convention for broadcast dimensions.

File Changed

tensorflow/core/kernels/mkl/mkl_batch_matmul_helper.h

Fixes #117700

gemini-code-assist

Code Review

This pull request modifies mkl_batch_matmul_helper.h to support broadcasting by allowing a zero dimension offset and setting the strides of prepended dimensions to zero, preventing heap corruption in OneDNN. The review feedback recommends adding unit tests to cover this broadcasting scenario and suggests using std::max when calculating dimension offsets to prevent potential underflow or out-of-bounds access.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-03T10:03:09Z

+    // When a tensor has been expanded to match the output rank via
+    // ExpandInputDimsToOutputShape, the prepended dimensions have size 1 and
+    // are broadcast. Their strides must be set to 0 so that OneDNN re-reads
+    // the same data for each batch index instead of advancing the pointer
+    // beyond the actual tensor buffer (which causes heap corruption).
+    // See #117700.


According to the TensorFlow PR Review Guidelines (Repository Style Guide, lines 57-61), unit tests are mandatory for any new logic, feature, or bug fix. Please add a unit test (for example, in tensorflow/python/kernel_tests/linalg/matmul_op_test.py or a relevant C++ test file) that reproduces the rank mismatch broadcasting scenario (e.g., rank-4 × rank-2) with OneDNN enabled to prevent regressions.

References

Mandatory test coverage for any new logic, feature, or bug fix. ^(link)

gemini-code-assist · 2026-06-03T10:03:09Z

+    auto dim_offset_lhs = ndims_out - ndims_lhs;
+    for (int i = 0; i < dim_offset_lhs; ++i) {
+      lhs_strides[i] = 0;
+    }
+    auto dim_offset_rhs = ndims_out - ndims_rhs;
+    for (int i = 0; i < dim_offset_rhs; ++i) {
+      rhs_strides[i] = 0;
+    }


To prevent potential unsigned underflow or unexpected cases where ndims_out < ndims_lhs (which could lead to out-of-bounds access if dim_offset_lhs wraps around or becomes negative), use std::max(0, ...) to safely bound the loop limit.

Suggested change

auto dim_offset_lhs = ndims_out - ndims_lhs;

for (int i = 0; i < dim_offset_lhs; ++i) {

lhs_strides[i] = 0;

}

auto dim_offset_rhs = ndims_out - ndims_rhs;

for (int i = 0; i < dim_offset_rhs; ++i) {

rhs_strides[i] = 0;

}

int dim_offset_lhs = std::max(0, ndims_out - ndims_lhs);

for (int i = 0; i < dim_offset_lhs; ++i) {

lhs_strides[i] = 0;

}

int dim_offset_rhs = std::max(0, ndims_out - ndims_rhs);

for (int i = 0; i < dim_offset_rhs; ++i) {

rhs_strides[i] = 0;

}

When tf.linalg.matmul broadcasts a rank-2 tensor against a rank-4 tensor (e.g., [13,1,3,4] x [4,16]), ExpandInputDimsToOutputShape expands the rank-2 shape to [1,1,4,16]. CalculateTFStrides then computes strides [64,64,16,1] for the expanded shape. However, the actual tensor only has 64 elements. OneDNN uses the expanded strides to access memory, reading at offsets like batch_idx * 64 which goes far beyond the tensor buffer, causing: - corrupted double-linked list - free(): invalid pointer - Segmentation fault The fix: set strides to 0 for prepended broadcast dimensions so OneDNN re-reads the same data for each batch index instead of advancing the pointer past the buffer. Uses std::max(0, ...) to safely bound the offset to prevent potential underflow. Also fixes DCHECK(dim_offset > 0) which should be >= 0. Fixes tensorflow#117700

gemini-code-assist · 2026-06-03T10:05:25Z

Thank you for the update. The use of std::max(0, ...) correctly handles the potential underflow, and the logic for setting strides to 0 for broadcast dimensions is appropriate for the OneDNN requirements. Including the reproduction script as a regression test is a good practice to ensure this issue does not recur.

Ashutosh0x · 2026-06-03T10:06:00Z

Hi @penpornk 👋 — could you take a look at this when you get a chance?

This fixes a heap corruption crash in the OneDNN batch matmul path when broadcasting tensors with mismatched ranks (e.g., rank-4 × rank-2). The root cause is that CalculateTFStrides computes non-zero strides for prepended broadcast dimensions, causing OneDNN to read past the tensor buffer.

The fix is small and localized — just zeroing the strides for broadcast dimensions, which is the standard OneDNN convention. The issue reporter confirmed the crash is reliably reproducible with a simple tf.linalg.matmul call.

Happy to address any additional feedback. Thanks!

Ashutosh0x requested a review from penpornk as a code owner June 3, 2026 10:01

google-ml-butler Bot added awaiting review Pull request awaiting review size:S CL Change Size: Small labels Jun 3, 2026

google-ml-butler Bot assigned gbaned Jun 3, 2026

gemini-code-assist Bot reviewed Jun 3, 2026

View reviewed changes

Ashutosh0x force-pushed the fix/mkl-matmul-rank4-corruption branch from e4da47b to 29e4932 Compare June 3, 2026 10:04

keerthanakadiri added comp:core issues related to core part of tensorflow prtype:bugfix PR to fix a bug labels Jun 3, 2026

keerthanakadiri added this to PR Queue Jun 3, 2026

github-project-automation Bot moved this to Assigned Reviewer in PR Queue Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: heap corruption in OneDNN batch matmul with broadcast rank mismatch#120199

fix: heap corruption in OneDNN batch matmul with broadcast rank mismatch#120199
Ashutosh0x wants to merge 1 commit into
tensorflow:masterfrom
Ashutosh0x:fix/mkl-matmul-rank4-corruption

Ashutosh0x commented Jun 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Uh oh!

gemini-code-assist Bot commented Jun 3, 2026

Uh oh!

Ashutosh0x commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Ashutosh0x commented Jun 3, 2026

Summary

Vulnerability (#117700)

Root Cause

Reproduction

Fix

File Changed

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot commented Jun 3, 2026

Uh oh!

Ashutosh0x commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants