Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[CUDA] illegal memory read on SparseSegmentSum #104261

@kokol16

Description

@kokol16

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

tf 2.20

Custom code

Yes

OS platform and distribution

Ubuntu 22.04

Mobile device

No response

Python version

3.10

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

CUDA 12.5.1, cuDNN 9.2.1

GPU model and memory

No response

Current behavior?

Compute-Sanitizer reports an out of bounds read on SegmentReduceVectorKernel

Standalone code to reproduce the issue

import tensorflow as tf

# 1) data: a [0, 4] double tensor (all zeros)
data = tf.zeros([0, 4], dtype=tf.double)

# 2) indices: a [4] int32 tensor with values [48, 0, 116, 0]
indices = tf.constant([48, 0, 116, 0], dtype=tf.int32)

# 3) segment_ids: a [4] int32 tensor (all zeros)
segment_ids = tf.zeros([4], dtype=tf.int32)

# 4) Run the SparseSegmentSum op on GPU
with tf.device('/GPU:0'):
    result = tf.raw_ops.SparseSegmentSum(
        data=data,
        indices=indices,
        segment_ids=segment_ids
    )

tf.print("SparseSegmentSum result:", result)

Relevant log output

========= Invalid __global__ read of size 16 bytes
=========     at void tensorflow::SegmentReduceVectorKernel<tensorflow::AlignedVector<double, (int)2>, tensorflow::AlignedVector<double, (int)2>, int, int, int, tensorflow::functor::Sum, double, double>(T3, T3, T5, T6, T7, T7, bool, bool, const T2 *, const T3 *, const T4 *, const T8 *, T2 *)+0x620
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x600 is out of bounds

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions