Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
No
Source
source
TensorFlow version
tf 2.21.0
Custom code
Yes
OS platform and distribution
Linux, 22.04.1-Ubuntu
Mobile device
No response
Python version
3.10.12
Bazel version
v1.27.0
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
I encountered a crash in TensorFlow when running tf.linalg.matmul on a rank-4 tensor in eager mode. The issue manifests as a native crash, not a Python exception. Observed errors include:
- munmap_chunk(): invalid pointer
- corrupted double-linked list
- free(): invalid pointer
- free(): invalid next size (fast)
- Segmentation fault (core dumped)
The issue disappears when disabling oneDNN optimizations, which suggests the bug is in the oneDNN CPU backend.
Standalone code to reproduce the issue
import os
os.environ['TF_ENABLE_ONEDNN_OPTS']='1'
import tensorflow as tf
tf.random.set_seed(280958)
p = tf.random.normal([13, 1, 3, 4])
w = tf.random.normal([4, 16])
out = tf.linalg.matmul(p, w)
print('output:', out)
Relevant log output
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1778010199.773678 758771 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
I0000 00:00:1778010199.774000 758771 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
I0000 00:00:1778010199.812209 758771 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1778010200.986329 758771 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
I0000 00:00:1778010200.987024 758771 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
E0000 00:00:1778010201.097696 758771 cuda_platform.cc:52] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
corrupted double-linked list
Aborted (core dumped)
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
No
Source
source
TensorFlow version
tf 2.21.0
Custom code
Yes
OS platform and distribution
Linux, 22.04.1-Ubuntu
Mobile device
No response
Python version
3.10.12
Bazel version
v1.27.0
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
I encountered a crash in TensorFlow when running tf.linalg.matmul on a rank-4 tensor in eager mode. The issue manifests as a native crash, not a Python exception. Observed errors include:
The issue disappears when disabling oneDNN optimizations, which suggests the bug is in the oneDNN CPU backend.
Standalone code to reproduce the issue
Relevant log output