RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient during multi-GPU training

I'm trying to train a model using PyTorch Lightning with DDP strategy on two GPUs. However, during the training setup, I encounter an error stating that the model has no parameters that require a gradient. 

(ipad) x12spa@x12spa-Super-Server:/media/x12spa/2TB_HD/pad_workspace/navsim/planning/script$ python3 run_training.py 
Seed set to 0
[2025-07-21 20:01:29,083][__main__][INFO] - Global Seed set to 0
[2025-07-21 20:01:29,084][__main__][INFO] - Path where all results are stored: /media/x12spa/8TB_disk/navsim/download/exp/ke/navsim/07.21_20.01
[2025-07-21 20:01:29,084][__main__][INFO] - Building Agent
/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  warnings.warn(
Loading Valid Caches: 100%|█████████████████| 978/978 [00:00<00:00, 7114.16it/s]
Loading Valid Caches: 100%|█████████████████| 214/214 [00:00<00:00, 6974.70it/s]
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
`Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA GeForce RTX 4090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
[rank: 0] Seed set to 0
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
[rank: 1] Seed set to 0
[2025-07-21 20:02:10,625][__main__][INFO] - Global Seed set to 0
[2025-07-21 20:02:10,627][__main__][INFO] - Path where all results are stored: /media/x12spa/8TB_disk/navsim/download/exp/ke/navsim/07.21_20.02
[2025-07-21 20:02:10,627][__main__][INFO] - Building Agent
/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  warnings.warn(
Loading Valid Caches: 100%|█████████████████| 978/978 [00:00<00:00, 5081.46it/s]
Loading Valid Caches: 100%|█████████████████| 214/214 [00:00<00:00, 5279.17it/s]
[rank: 1] Seed set to 0
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
Missing logger folder: /media/x12spa/8TB_disk/navsim/download/exp/ke/navsim/07.21_20.02/lightning_logs
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes
----------------------------------------------------------------------------------------------------

Missing logger folder: /media/x12spa/8TB_disk/navsim/download/exp/ke/navsim/07.21_20.01/lightning_logs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]
Error executing job with overrides: []
Error executing job with overrides: []
Traceback (most recent call last):
  File "run_training.py", line 138, in main
    trainer.fit(
  File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
    return function(*args, **kwargs)
  File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 963, in _run
    self.strategy.setup(self)
  File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 171, in setup
    self.configure_ddp()
  File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 283, in configure_ddp
    self.model = self._setup_model(self.model)
  File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 195, in _setup_model
    return DistributedDataParallel(module=model, device_ids=device_ids, **self._ddp_kwargs)
  File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 678, in __init__
    self._log_and_throw(
  File "/home/x12spa/miniforge3/envs/ipad/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1037, in _log_and_throw
    raise err_type(err_msg)
RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient during multi-GPU training #7

distributed_backend=nccl
All distributed processes registered. Starting with 2 processes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient during multi-GPU training #7

Description

distributed_backend=nccl All distributed processes registered. Starting with 2 processes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

distributed_backend=nccl
All distributed processes registered. Starting with 2 processes