-
Notifications
You must be signed in to change notification settings - Fork 75k
Open
Labels
Description
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
source
TensorFlow version
tf 2.20
Custom code
Yes
OS platform and distribution
Ubuntu 22.04
Mobile device
No response
Python version
3.10
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
CUDA 12.5.1, cuDNN 9.2.1
GPU model and memory
No response
Current behavior?
Compute-Sanitizer reports an out of bounds read on SegmentReduceVectorKernel
Standalone code to reproduce the issue
import tensorflow as tf
# 1) data: a [0, 4] double tensor (all zeros)
data = tf.zeros([0, 4], dtype=tf.double)
# 2) indices: a [4] int32 tensor with values [48, 0, 116, 0]
indices = tf.constant([48, 0, 116, 0], dtype=tf.int32)
# 3) segment_ids: a [4] int32 tensor (all zeros)
segment_ids = tf.zeros([4], dtype=tf.int32)
# 4) Run the SparseSegmentSum op on GPU
with tf.device('/GPU:0'):
result = tf.raw_ops.SparseSegmentSum(
data=data,
indices=indices,
segment_ids=segment_ids
)
tf.print("SparseSegmentSum result:", result)Relevant log output
========= Invalid __global__ read of size 16 bytes
========= at void tensorflow::SegmentReduceVectorKernel<tensorflow::AlignedVector<double, (int)2>, tensorflow::AlignedVector<double, (int)2>, int, int, int, tensorflow::functor::Sum, double, double>(T3, T3, T5, T6, T7, T7, bool, bool, const T2 *, const T3 *, const T4 *, const T8 *, T2 *)+0x620
========= by thread (0,0,0) in block (0,0,0)
========= Address 0x600 is out of bounds