-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
Hi,
Thanks for the awesome repo!
When trying to reproduce the results with bash scripts/gp_training/vl_train.sh, I checked the model's gradients and found that the gradients are None for all model parameters and the model weights do not seem to be updating. This happens for both distributed training and single-card training (as shown in the following lines in ppo.py). Can you help me troubleshoot this?
if self.accelerator.sync_gradients:
self.accelerator.clip_grad_norm_(
self.actor_critic.parameters(),
self.max_grad_norm
)
import deepspeed
no_grad = []
has_grad = []
for name, param in self.actor_critic.named_parameters():
with deepspeed.zero.GatheredParameters(param, modifier_rank=None):
if param.requires_grad:
if param.grad is None:
no_grad.append(name)
else:
has_grad.append(name)
if len(no_grad) > 0:
print("====No gradient: ", no_grad)
if len(has_grad) > 0:
print("====Has gradient: ", has_grad)
print("====Average weight: ", torch.cat([i.flatten() for i in self.actor_critic.parameters()]).mean())
Metadata
Metadata
Assignees
Labels
No labels