Thanks to visit codestin.com
Credit goes to github.com

Skip to content

FloatingPointError: Predicted boxes or scores contain Inf/NaN. Training has diverged #11

@qianyuling-wl

Description

@qianyuling-wl

I used the datasets you provided,when I trained it was wrong,,how should I do? Waiting for your reply,thank you!
[07/03 15:22:15 d2.engine.train_loop]: Starting training from iteration 0
Failed to converge. l_inf norm is: 3.1583547592163086
Failed to converge. l_inf norm is: 4.175591468811035
Failed to converge. l_inf norm is: 12.365741729736328
Failed to converge. l_inf norm is: 6.024055480957031
Failed to converge. l_inf norm is: nan
Failed to converge. l_inf norm is: nan
Failed to converge. l_inf norm is: nan
Failed to converge. l_inf norm is: nan
Failed to converge. l_inf norm is: nan
Failed to converge. l_inf norm is: nan
Failed to converge. l_inf norm is: 1.2032318115234375
Failed to converge. l_inf norm is: 1.0705947875976562
Failed to converge. l_inf norm is: 6.957664489746094
Failed to converge. l_inf norm is: 4.854583740234375
Failed to converge. l_inf norm is: 27.565383911132812
Failed to converge. l_inf norm is: nan
Failed to converge. l_inf norm is: 18.025360107421875
Failed to converge. l_inf norm is: nan
Failed to converge. l_inf norm is: 19.686927795410156
Failed to converge. l_inf norm is: 12.671699523925781
Failed to converge. l_inf norm is: 10.474720001220703
Failed to converge. l_inf norm is: 21.66767120361328
Failed to converge. l_inf norm is: 9.996528625488281
Failed to converge. l_inf norm is: 18.128585815429688
Failed to converge. l_inf norm is: nan
Failed to converge. l_inf norm is: nan
ERROR [07/03 15:22:45 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/opt/detectron2_repo/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/host/deepformable/engine/trainers.py", line 203, in run_step
loss_dict = self.model(data)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/host/deepformable/engine/trainers.py", line 126, in forward
output = self.model(data)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/detectron2_repo/detectron2/modeling/meta_arch/rcnn.py", line 157, in forward
proposals, proposal_losses = self.proposal_generator(images, features, gt_instances)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/detectron2_repo/detectron2/modeling/proposal_generator/rpn.py", line 478, in forward
anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes
File "/opt/detectron2_repo/detectron2/modeling/proposal_generator/rpn.py", line 511, in predict_proposals
self.training,
File "/opt/detectron2_repo/detectron2/modeling/proposal_generator/proposal_utils.py", line 104, in find_top_rpn_proposals
"Predicted boxes or scores contain Inf/NaN. Training has diverged."
FloatingPointError: Predicted boxes or scores contain Inf/NaN. Training has diverged.
[07/03 15:22:45 d2.engine.hooks]: Total training time: 0:00:05 (0:00:00 on hooks)
[07/03 15:22:45 d2.utils.events]: iter: 2 total_loss: -5.302 loss_sample_reg: -0.09662 loss_corner_reg: -0.0106 objectness_loss: -0.816 decoding_loss: -1.006 loss_rpn_cls: -1.459 loss_rpn_loc: -1.914 data_time: 1.0866 lr: 4.036e-07 max_mem: 15650M
Traceback (most recent call last):
File "tools/train.py", line 98, in
args=(args,),
File "/opt/detectron2_repo/detectron2/engine/launch.py", line 79, in launch
daemon=False,
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/opt/detectron2_repo/detectron2/engine/launch.py", line 126, in _distributed_worker
main_func(*args)
File "/host/tools/train.py", line 73, in main
return trainer.train()
File "/host/deepformable/engine/trainers.py", line 189, in train
super().train(self.start_iter, self.max_iter)
File "/opt/detectron2_repo/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/host/deepformable/engine/trainers.py", line 203, in run_step
loss_dict = self.model(data)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/host/deepformable/engine/trainers.py", line 126, in forward
output = self.model(data)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/detectron2_repo/detectron2/modeling/meta_arch/rcnn.py", line 157, in forward
proposals, proposal_losses = self.proposal_generator(images, features, gt_instances)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/detectron2_repo/detectron2/modeling/proposal_generator/rpn.py", line 478, in forward
anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes
File "/opt/detectron2_repo/detectron2/modeling/proposal_generator/rpn.py", line 511, in predict_proposals
self.training,
File "/opt/detectron2_repo/detectron2/modeling/proposal_generator/proposal_utils.py", line 104, in find_top_rpn_proposals
"Predicted boxes or scores contain Inf/NaN. Training has diverged."
FloatingPointError: Predicted boxes or scores contain Inf/NaN. Training has diverged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions