Memory Corruption in matmul with rank-4 input tensor and ONEDNN enabled

### Issue type

Bug

### Have you reproduced the bug with TensorFlow Nightly?

No

### Source

source

### TensorFlow version

tf 2.21.0

### Custom code

Yes

### OS platform and distribution

Linux, 22.04.1-Ubuntu

### Mobile device

_No response_

### Python version

3.10.12

### Bazel version

v1.27.0

### GCC/compiler version

_No response_

### CUDA/cuDNN version

_No response_

### GPU model and memory

_No response_

### Current behavior?

I encountered a crash in TensorFlow when running tf.linalg.matmul on a rank-4 tensor in eager mode. The issue manifests as a native crash, not a Python exception. Observed errors include:

- munmap_chunk(): invalid pointer
- corrupted double-linked list
- free(): invalid pointer
- free(): invalid next size (fast)
- Segmentation fault (core dumped)

The issue disappears when disabling oneDNN optimizations, which suggests the bug is in the oneDNN CPU backend.


### Standalone code to reproduce the issue

```shell
import os
os.environ['TF_ENABLE_ONEDNN_OPTS']='1'
import tensorflow as tf
tf.random.set_seed(280958)

p = tf.random.normal([13, 1, 3, 4])
w = tf.random.normal([4, 16])

out = tf.linalg.matmul(p, w)

print('output:', out)
```

### Relevant log output

```shell
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1778010199.773678  758771 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
I0000 00:00:1778010199.774000  758771 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
I0000 00:00:1778010199.812209  758771 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1778010200.986329  758771 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
I0000 00:00:1778010200.987024  758771 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
E0000 00:00:1778010201.097696  758771 cuda_platform.cc:52] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
corrupted double-linked list
Aborted (core dumped)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Corruption in matmul with rank-4 input tensor and ONEDNN enabled #117700

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Memory Corruption in matmul with rank-4 input tensor and ONEDNN enabled #117700

Description

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions