-
Couldn't load subscription status.
- Fork 16
Description
Hello,
Following is my command:
singularity run --nv /data/a/zhangwencai/software/herro.sif inference --read-alns /data/b/zhangwencai/ultra_long/japo_fromGuoSong/minimap2_alignment -t 1 -b 1 -m /data/a/zhangwencai/software/herro/model_R9_v0.1.pt /data/b/zhangwencai/ultra_long/japo_fromGuoSong/DY48490_ONT_UL_200kb.fastq DY48490_ONT_UL_200kb_herro.fasta
The following is the error content:
[00:00:05] Parsed 10543 reads. [00:00:00] Processing 1/? batch ⡀ thread '' panicked at /herro/src/inference.rs:209:70:
Cannot load model.: Torch("CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.\nException raised from device_count_impl at ../c10/cuda/CUDAFunctions.cpp:69 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7f5cd385a6bb in /libs/libtorch/lib/libc10.so)\nframe #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0xc9 (0x7f5cd3855769 in /libs/libtorch/lib/libc10.so)\nframe #2: c10::cuda::device_count_ensure_non_zero() + 0x117 (0x7f5cd324b027 in /libs/libtorch/lib/libc10_cuda.so)\nframe #3: + 0x103931a (0x7f5ced03931a in /libs/libtorch/lib/libtorch_cuda.so)\nframe #4: + 0x2c30f36 (0x7f5ceec30f36 in /libs/libtorch/lib/libtorch_cuda.so)\nframe #5: + 0x2c30ffb (0x7f5ceec30ffb in /libs/libtorch/lib/libtorch_cuda.so)\nframe #6: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRefc10::SymInt, c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional) + 0x1fb (0x7f5cd5eb71fb in /libs/libtorch/lib/libtorch_cpu.so)\nframe #7: + 0x25ebc75 (0x7f5cd61ebc75 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #8: at::_ops::empty_strided::call(c10::ArrayRefc10::SymInt, c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional) + 0x168 (0x7f5cd5ef2328 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #9: + 0x1701f5f (0x7f5cd5301f5f in /libs/libtorch/lib/libtorch_cpu.so)\nframe #10: at::native::_to_copy(at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x17e3 (0x7f5cd56a6cf3 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #11: + 0x27d3603 (0x7f5cd63d3603 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x103 (0x7f5cd5b93c83 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #13: + 0x25f01c8 (0x7f5cd61f01c8 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #14: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x103 (0x7f5cd5b93c83 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #15: + 0x3a66271 (0x7f5cd7666271 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #16: + 0x3a6681b (0x7f5cd766681b in /libs/libtorch/lib/libtorch_cpu.so)\nframe #17: at::_ops::_to_copy::call(at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x201 (0x7f5cd5c16651 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #18: at::native::to(at::Tensor const&, c10::Device, c10::ScalarType, bool, bool, c10::optionalc10::MemoryFormat) + 0xfd (0x7f5cd56a505d in /libs/libtorch/lib/libtorch_cpu.so)\nframe #19: + 0x29a5612 (0x7f5cd65a5612 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #20: at::_ops::to_device::call(at::Tensor const&, c10::Device, c10::ScalarType, bool, bool, c10::optionalc10::MemoryFormat) + 0x1c1 (0x7f5cd5d95cd1 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #21: torch::jit::Unpickler::readInstruction() + 0x1719 (0x7f5cd8766789 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #22: torch::jit::Unpickler::run() + 0xa8 (0x7f5cd8767988 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #23: torch::jit::Unpickler::parse_ivalue() + 0x2e (0x7f5cd876953e in /libs/libtorch/lib/libtorch_cpu.so)\nframe #24: torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_typec10::ivalue::Object > (c10::StrongTypePtr, c10::IValue)> >, c10::optionalc10::Device, caffe2::serialize::PyTorchStreamReader&, c10::Type::SingletonOrSharedTypePtrc10::Type (*)(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&), std::shared_ptrtorch::jit::DeserializationStorageContext) + 0x529 (0x7f5cd87241a9 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #25: + 0x4b08c4b (0x7f5cd8708c4b in /libs/libtorch/lib/libtorch_cpu.so)\nframe #26: + 0x4b0b04b (0x7f5cd870b04b in /libs/libtorch/lib/libtorch_cpu.so)\nframe #27: torch::jit::import_ir_module(std::shared_ptrtorch::jit::CompilationUnit, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::optionalc10::Device, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >&, bool, bool) + 0x3a2 (0x7f5cd870f6c2 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #28: torch::jit::import_ir_module(std::shared_ptrtorch::jit::CompilationUnit, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::optionalc10::Device, bool) + 0x92 (0x7f5cd870fa42 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #29: torch::jit::load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::optionalc10::Device, bool) + 0xd1 (0x7f5cd870fb71 in /libs/libtorch/lib/libtorch_cpu.so)\nframe #30: + 0x1ee52e (0x55cf434da52e in herro)\nframe #31: + 0xd4bc9 (0x55cf433c0bc9 in herro)\nframe #32: + 0x1062b6 (0x55cf433f22b6 in herro)\nframe #33: + 0xc0aec (0x55cf433acaec in herro)\nframe #34: + 0xf56e5 (0x55cf433e16e5 in herro)\nframe #35: + 0x15ae9b (0x55cf43446e9b in herro)\nframe #36: + 0x94ac3 (0x7f5cd366bac3 in /lib/x86_64-linux-gnu/libc.so.6)\nframe #37: clone + 0x44 (0x7f5cd36fca04 in /lib/x86_64-linux-gnu/libc.so.6)\n")
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Aborted (core dumped)
Please tell me where is the error, what should I do?
Below are my CUDA version and GPU version:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Aug_14_10:10:22_PDT_2024
Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0
nvidia-smi
Mon Dec 2 09:55:33 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX A6000 Off | 00000000:31:00.0 Off | Off |
| 30% 58C P0 80W / 300W | 1MiB / 49140MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Best wishes,
WenCai