failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED

Hi Reuben,

I am trying to get your U-HVED model to train, but I am running into an error that I cannot shake. 
My node has an 80GB GPU and I am currently using 100 GB RAM. The environment I am using uses tf 1.12 and niftinet 0.5.0 as prescribed in the requirements.txt

Have you seen this error before? 

[Layer] VAE/ConvDecoderImg/final_conv_seg_4 [Trainable] conv_/w (16)
seeg
INFO:niftynet: Cross entropy loss function calls tf.nn.sparse_softmax_cross_entropy_with_logits which always performs a softmax internally.
output
Tensor("worker_0/concat:0", shape=(1, 112, 112, 112, 4), dtype=float32, device=/device:GPU:0)
output_seg
Tensor("worker_0/VAE/ConvDecoderImg/final_conv_seg_4/conv_/conv:0", shape=(1, 112, 112, 112, 4), dtype=float32, device=/device:GPU:0)
gt
Tensor("worker_0/train/Squeeze_1:0", shape=(1, 112, 112, 112, 1), dtype=float32, device=/device:GPU:0)
WARNING:niftynet: Tried to colocate op 'worker_0/gradients/worker_0/loss_function_1/map/while/Mean_grad/add/Const' (defined at /data/U-HVED/extensions/u_hved/application.py:253) having device '/device:CPU:0' with op 'worker_0/gradients/worker_0/loss_function_1/map/while/Mean_grad/Shape' (defined at /data/U-HVED/extensions/u_hved/application.py:253) which had an incompatible device '/device:GPU:0'.

Node-device colocations active during op 'worker_0/gradients/worker_0/loss_function_1/map/while/Mean_grad/add/Const' creation:
  with tf.colocate_with(worker_0/loss_function_1/map/while/range_1): </data/tfEnv/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py:1004>
  with tf.colocate_with(worker_0/gradients/worker_0/loss_function_1/map/while/Mean_grad/Shape): </data/tfEnv/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:80>
No device assignments were active during op 'worker_0/gradients/worker_0/loss_function_1/map/while/Mean_grad/add/Const' creation.

No node-device colocations were active during op 'worker_0/gradients/worker_0/loss_function_1/map/while/Mean_grad/Shape' creation.
Device assignments active during op 'worker_0/gradients/worker_0/loss_function_1/map/while/Mean_grad/Shape' creation:
  with tf.device(/gpu:0): </data/tfEnv/lib/python3.6/site-packages/niftynet/engine/application_driver.py:267>
  with tf.device(/cpu:0): </data/tfEnv/lib/python3.6/site-packages/niftynet/engine/application_driver.py:249>
WARNING:niftynet: Tried to colocate op 'worker_0/gradients/worker_0/loss_function_1/map/while/Mean_grad/add/f_acc' (defined at /data/U-HVED/extensions/u_hved/application.py:253) having device '/device:CPU:0' with op 'worker_0/gradients/worker_0/loss_function_1/map/while/Mean_grad/Shape' (defined at /data/U-HVED/extensions/u_hved/application.py:253) which had an incompatible device '/device:GPU:0'.

Node-device colocations active during op 'worker_0/gradients/worker_0/loss_function_1/map/while/Mean_grad/add/f_acc' creation:
  with tf.colocate_with(worker_0/loss_function_1/map/while/range_1): </data/tfEnv/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py:1004>
  with tf.colocate_with(worker_0/gradients/worker_0/loss_function_1/map/while/Mean_grad/Shape): </data/tfEnv/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:80>
No device assignments were active during op 'worker_0/gradients/worker_0/loss_function_1/map/while/Mean_grad/add/f_acc' creation.

No node-device colocations were active during op 'worker_0/gradients/worker_0/loss_function_1/map/while/Mean_grad/Shape' creation.
Device assignments active during op 'worker_0/gradients/worker_0/loss_function_1/map/while/Mean_grad/Shape' creation:
  with tf.device(/gpu:0): </data/tfEnv/lib/python3.6/site-packages/niftynet/engine/application_driver.py:267>
  with tf.device(/cpu:0): </data/tfEnv/lib/python3.6/site-packages/niftynet/engine/application_driver.py:249>
2025-05-29 17:17:14.168112: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2025-05-29 17:17:14.168215: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2025-05-29 17:17:14.168230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2025-05-29 17:17:14.168240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2025-05-29 17:17:14.168392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 76618 MB memory) -> physical GPU (device: 0, name: NVIDIA A100-SXM4-80GB, pci bus id: 0000:48:00.0, compute capability: 8.0)
INFO:niftynet: Parameters from random initialisations ...
2025-05-29 17:18:25.202462: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 1 of 30
2025-05-29 17:18:25.431713: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:136] Shuffle buffer filled.
2025-05-29 17:18:25.725596: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
INFO:niftynet: cleaning up...
2025-05-29 17:18:26.074386: I tensorflow/stream_executor/stream.cc:2076] [stream=0x555ac32b1a80,impl=0x555aa752aab0] did not wait for [stream=0x555ac27f3f10,impl=0x555aa752a620]
2025-05-29 17:18:26.074480: I tensorflow/stream_executor/stream.cc:5011] [stream=0x555ac32b1a80,impl=0x555aa752aab0] did not memcpy device-to-host; source: 0x7fac8a293600
2025-05-29 17:18:26.074557: I tensorflow/stream_executor/stream.cc:2076] [stream=0x555ac32b1a80,impl=0x555aa752aab0] did not wait for [stream=0x555ac27f3f10,impl=0x555aa752a620]
2025-05-29 17:18:26.074500: I tensorflow/stream_executor/stream.cc:2076] [stream=0x555ac32b1a80,impl=0x555aa752aab0] did not wait for [stream=0x555ac27f3f10,impl=0x555aa752a620]
2025-05-29 17:18:26.074589: F tensorflow/core/common_runtime/gpu/gpu_util.cc:292] GPU->CPU Memcpy failed
2025-05-29 17:18:26.074607: I tensorflow/stream_executor/stream.cc:5011] [stream=0x555ac32b1a80,impl=0x555aa752aab0] did not memcpy device-to-host; source: 0x7fac8aee2200
Aborted (core dumped)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED #18

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED #18

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions