Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

mxnet gets stuck on cudaMemGetInfo #6281

@conopt

Description

@conopt

Environment info

Operating System: CentOS with cuda V8.0.61

Compiler: g++ 5.3.1

MXNet commit hash (git rev-parse HEAD): 3d545d7

Steps to reproduce

  1. cd cpp-package/example
  2. ./get_mnist.sh
  3. make mlp_gpu && ./mlp_gpu

Part of gdb backtrace:
#0 0x00007fff5d718990 in ?? () from /usr/lib64/nvidia/libnvidia-ptxjitcompiler.so.375.26
#1 0x00007fff5d718ac6 in ?? () from /usr/lib64/nvidia/libnvidia-ptxjitcompiler.so.375.26
#2 0x00007fff5d778e8a in ?? () from /usr/lib64/nvidia/libnvidia-ptxjitcompiler.so.375.26
#3 0x00007fff5d71fecb in ?? () from /usr/lib64/nvidia/libnvidia-ptxjitcompiler.so.375.26
#4 0x00007fff5d99becf in ?? () from /usr/lib64/nvidia/libnvidia-ptxjitcompiler.so.375.26
#5 0x00007fff5d99bf39 in ?? () from /usr/lib64/nvidia/libnvidia-ptxjitcompiler.so.375.26
#6 0x00007fff5d5eed6d in ?? () from /usr/lib64/nvidia/libnvidia-ptxjitcompiler.so.375.26
#7 0x00007fff5d5f64f8 in ?? () from /usr/lib64/nvidia/libnvidia-ptxjitcompiler.so.375.26
#8 0x00007fff5dbf140d in ?? () from /usr/lib64/nvidia/libnvidia-ptxjitcompiler.so.375.26
#9 0x00007fff5d5f9b94 in ?? () from /usr/lib64/nvidia/libnvidia-ptxjitcompiler.so.375.26
#10 0x00007fff5d5fb2e9 in ?? () from /usr/lib64/nvidia/libnvidia-ptxjitcompiler.so.375.26
#11 0x00007fff5d5f1abc in _cuda_CallJitEntryPoint ()
from /usr/lib64/nvidia/libnvidia-ptxjitcompiler.so.375.26
#12 0x00007fffc4bff582 in fatBinaryCtl_Compile ()
from /usr/lib64/nvidia/libnvidia-fatbinaryloader.so.375.26
#13 0x00007fffd3625e42 in ?? () from /usr/lib64/nvidia/libcuda.so.1
#14 0x00007fffd36269c3 in ?? () from /usr/lib64/nvidia/libcuda.so.1
#15 0x00007fffd357f35e in ?? () from /usr/lib64/nvidia/libcuda.so.1
#16 0x00007fffd357f640 in ?? () from /usr/lib64/nvidia/libcuda.so.1
#17 0x00007fffe30dfa5d in ?? () from /usr/local/cuda-8.0/lib64/libcudart.so.8.0
#18 0x00007fffe30d3e60 in ?? () from /usr/local/cuda-8.0/lib64/libcudart.so.8.0
#19 0x00007fffe30decc6 in ?? () from /usr/local/cuda-8.0/lib64/libcudart.so.8.0
#20 0x00007fffe30e3401 in ?? () from /usr/local/cuda-8.0/lib64/libcudart.so.8.0
#21 0x00007fffe30d672e in ?? () from /usr/local/cuda-8.0/lib64/libcudart.so.8.0
#22 0x00007fffe30c3e8e in ?? () from /usr/local/cuda-8.0/lib64/libcudart.so.8.0
#23 0x00007fffe30f417c in cudaMemGetInfo () from /usr/local/cuda-8.0/lib64/libcudart.so.8.0
#24 0x00007fffe652aea5 in mxnet::storage::GPUPooledStorageManager::Alloc (this=0xa5fe80,
raw_size=401408) at src/storage/./pooled_storage_manager.h:77
#25 0x00007fffe652b3f9 in mxnet::StorageImpl::Alloc (this=0x7fff6c0052d0, size=401408, ctx=...)
at src/storage/storage.cc:86
#26 0x00007fffe6010bfa in mxnet::NDArray::Chunk::CheckAndAlloc (this=0xa6c790)
at include/mxnet/./ndarray.h:391
#27 0x00007fffe6010bb5 in mxnet::NDArray::Chunk::Chunk (this=0xa6c790, size=100352, ctx=...,
delay_alloc
=false, dtype=0) at include/mxnet/./ndarray.h:386

It only stuck on cuda 8.0.61. I tried another machine with cuda 8.0.44 and it worked well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions