Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient during multi-GPU training #7

@Qinuluo

Description

@Qinuluo

I'm trying to train a model using PyTorch Lightning with DDP strategy on two GPUs. However, during the training setup, I encounter an error stating that the model has no parameters that require a gradient.

(ipad) x12spa@x12spa-Super-Server:/media/x12spa/2TB_HD/pad_workspace/navsim/planning/script$ python3 run_training.py
Seed set to 0
[2025-07-21 20:01:29,083][main][INFO] - Global Seed set to 0
[2025-07-21 20:01:29,084][main][INFO] - Path where all results are stored: /media/x12spa/8TB_disk/navsim/download/exp/ke/navsim/07.21_20.01
[2025-07-21 20:01:29,084][main][INFO] - Building Agent
/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
Loading Valid Caches: 100%|█████████████████| 978/978 [00:00<00:00, 7114.16it/s]
Loading Valid Caches: 100%|█████████████████| 214/214 [00:00<00:00, 6974.70it/s]
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Trainer(limit_train_batches=1.0) was configured so 100% of the batches per epoch will be used..
Trainer(limit_val_batches=1.0) was configured so 100% of the batches will be used..
Trainer(val_check_interval=1.0) was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA GeForce RTX 4090') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
[rank: 0] Seed set to 0
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
[rank: 1] Seed set to 0
[2025-07-21 20:02:10,625][main][INFO] - Global Seed set to 0
[2025-07-21 20:02:10,627][main][INFO] - Path where all results are stored: /media/x12spa/8TB_disk/navsim/download/exp/ke/navsim/07.21_20.02
[2025-07-21 20:02:10,627][main][INFO] - Building Agent
/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
Loading Valid Caches: 100%|█████████████████| 978/978 [00:00<00:00, 5081.46it/s]
Loading Valid Caches: 100%|█████████████████| 214/214 [00:00<00:00, 5279.17it/s]
[rank: 1] Seed set to 0
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
Missing logger folder: /media/x12spa/8TB_disk/navsim/download/exp/ke/navsim/07.21_20.02/lightning_logs

distributed_backend=nccl
All distributed processes registered. Starting with 2 processes

Missing logger folder: /media/x12spa/8TB_disk/navsim/download/exp/ke/navsim/07.21_20.01/lightning_logs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]
Error executing job with overrides: []
Error executing job with overrides: []
Traceback (most recent call last):
File "run_training.py", line 138, in main
trainer.fit(
File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
return function(*args, **kwargs)
File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 963, in _run
self.strategy.setup(self)
File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 171, in setup
self.configure_ddp()
File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 283, in configure_ddp
self.model = self._setup_model(self.model)
File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 195, in _setup_model
return DistributedDataParallel(module=model, device_ids=device_ids, **self._ddp_kwargs)
File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 678, in init
self._log_and_throw(
File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1037, in _log_and_throw
raise err_type(err_msg)
RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions