Thanks to visit codestin.com
Credit goes to github.com

Skip to content

The error when training #3

@chengshuai

Description

@chengshuai

@unsky , a nice work

When training, the error occur. the details is bellow:
#########################################train#######################################
./data/VOCdevkit2007/VOC2007/JPEGImages/2009_002123.jpg
./data/VOCdevkit2007/VOC2007/JPEGImages/000783.jpg
[08:24:35] /home/chengshuai/mx-maskrcnn-master1/incubator-mxnet/dmlc-core/include/dmlc/logging.h:308: [08:24:35] /home/chengshuai/mx-maskrcnn-master1/incubator-mxnet/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:58: too large launch parameter: Softmax[89847,1], [256,1,1]

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f0b7ad7f70c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow4cuda16CheckLaunchParamE4dim3S1_PKc+0x165) [0x7f0b7d3e83f5]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZN7mshadow4cuda7SoftmaxIfEEvRKNS_6TensorINS_3gpuELi2ET_EES7+0xfa) [0x7f0b7e3ec24a]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZN5mxnet2op19SoftmaxActivationOpIN7mshadow3gpuEE7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESD_SD+0x20b) [0x7f0b7e4fe57b]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZN5mxnet2op13OperatorState7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS6_EERKS5_INS_9OpReqTypeESaISB_EESA+0x354) [0x7f0b7d04a524]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZZN5mxnet10imperative12PushOperatorERKNS_10OpStatePtrEPKN4nnvm2OpERKNS4_9NodeAttrsERKNS_7ContextERKSt6vectorIPNS_6engine3VarESaISH_EESL_RKSE_INS_8ResourceESaISM_EERKSE_IPNS_7NDArrayESaISS_EESW_RKSE_IjSaIjEERKSE_INS_9OpReqTypeESaIS11_EENS_12DispatchModeEENKUlNS_10RunContextENSF_18CallbackOnCompleteEE0_clES17_S18+0x2a0) [0x7f0b7cec2950]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x9d) [0x7f0b7ce3fc6d]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0xf3) [0x7f0b7ce43cb3]
[bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataS5+0x56) [0x7f0b7ce43e96]
[bt] (9) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x3b) [0x7f0b7ce410cb]

[08:24:35] /home/chengshuai/mx-maskrcnn-master1/incubator-mxnet/dmlc-core/include/dmlc/logging.h:308: [08:24:35] src/engine/./threaded_engine.h:370: [08:24:35] /home/chengshuai/mx-maskrcnn-master1/incubator-mxnet/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:58: too large launch parameter: Softmax[89847,1], [256,1,1]

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f0b7ad7f70c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow4cuda16CheckLaunchParamE4dim3S1_PKc+0x165) [0x7f0b7d3e83f5]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZN7mshadow4cuda7SoftmaxIfEEvRKNS_6TensorINS_3gpuELi2ET_EES7+0xfa) [0x7f0b7e3ec24a]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZN5mxnet2op19SoftmaxActivationOpIN7mshadow3gpuEE7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESD_SD+0x20b) [0x7f0b7e4fe57b]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZN5mxnet2op13OperatorState7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS6_EERKS5_INS_9OpReqTypeESaISB_EESA+0x354) [0x7f0b7d04a524]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZZN5mxnet10imperative12PushOperatorERKNS_10OpStatePtrEPKN4nnvm2OpERKNS4_9NodeAttrsERKNS_7ContextERKSt6vectorIPNS_6engine3VarESaISH_EESL_RKSE_INS_8ResourceESaISM_EERKSE_IPNS_7NDArrayESaISS_EESW_RKSE_IjSaIjEERKSE_INS_9OpReqTypeESaIS11_EENS_12DispatchModeEENKUlNS_10RunContextENSF_18CallbackOnCompleteEE0_clES17_S18+0x2a0) [0x7f0b7cec2950]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x9d) [0x7f0b7ce3fc6d]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0xf3) [0x7f0b7ce43cb3]
[bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataS5+0x56) [0x7f0b7ce43e96]
[bt] (9) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x3b) [0x7f0b7ce410cb]

A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 8 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f0b7ad7f70c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x3a0) [0x7f0b7ce3ff70]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0xf3) [0x7f0b7ce43cb3]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataS5+0x56) [0x7f0b7ce43e96]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x3b) [0x7f0b7ce410cb]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb1a60) [0x7f0b9d5c7a60]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7f0ba193a182]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f0ba166747d]

terminate called after throwing an instance of 'dmlc::Error'
what(): [08:24:35] src/engine/./threaded_engine.h:370: [08:24:35] /home/chengshuai/mx-maskrcnn-master1/incubator-mxnet/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:58: too large launch parameter: Softmax[89847,1], [256,1,1]

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f0b7ad7f70c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow4cuda16CheckLaunchParamE4dim3S1_PKc+0x165) [0x7f0b7d3e83f5]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZN7mshadow4cuda7SoftmaxIfEEvRKNS_6TensorINS_3gpuELi2ET_EES7+0xfa) [0x7f0b7e3ec24a]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZN5mxnet2op19SoftmaxActivationOpIN7mshadow3gpuEE7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESD_SD+0x20b) [0x7f0b7e4fe57b]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZN5mxnet2op13OperatorState7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS6_EERKS5_INS_9OpReqTypeESaISB_EESA+0x354) [0x7f0b7d04a524]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZZN5mxnet10imperative12PushOperatorERKNS_10OpStatePtrEPKN4nnvm2OpERKNS4_9NodeAttrsERKNS_7ContextERKSt6vectorIPNS_6engine3VarESaISH_EESL_RKSE_INS_8ResourceESaISM_EERKSE_IPNS_7NDArrayESaISS_EESW_RKSE_IjSaIjEERKSE_INS_9OpReqTypeESaIS11_EENS_12DispatchModeEENKUlNS_10RunContextENSF_18CallbackOnCompleteEE0_clES17_S18+0x2a0) [0x7f0b7cec2950]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x9d) [0x7f0b7ce3fc6d]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0xf3) [0x7f0b7ce43cb3]
[bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataS5+0x56) [0x7f0b7ce43e96]
[bt] (9) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x3b) [0x7f0b7ce410cb]

A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 8 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f0b7ad7f70c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x3a0) [0x7f0b7ce3ff70]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0xf3) [0x7f0b7ce43cb3]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataS5+0x56) [0x7f0b7ce43e96]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.12.1-py2.7.egg/mxnet/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x3b) [0x7f0b7ce410cb]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb1a60) [0x7f0b9d5c7a60]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7f0ba193a182]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f0ba166747d]

What is the problem? Does the mxnet version result in?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions