Gradient is None and model weights remain the same

Hi,

Thanks for the awesome repo!

When trying to reproduce the results with `bash scripts/gp_training/vl_train.sh,` I checked the model's gradients and found that the gradients are None for all model parameters and the model weights do not seem to be updating. This happens for both distributed training and single-card training (as shown in the following lines in `ppo.py`). Can you help me troubleshoot this?

```
if self.accelerator.sync_gradients:

    self.accelerator.clip_grad_norm_(
        self.actor_critic.parameters(),
        self.max_grad_norm
    )

    import deepspeed
    no_grad = []
    has_grad = []
    for name, param in self.actor_critic.named_parameters():
        with deepspeed.zero.GatheredParameters(param, modifier_rank=None):
            if param.requires_grad:
                if param.grad is None:
                    no_grad.append(name)
                else:
                    has_grad.append(name)

    if len(no_grad) > 0:
        print("====No gradient: ", no_grad)
    if len(has_grad) > 0:
        print("====Has gradient: ", has_grad)

    print("====Average weight: ", torch.cat([i.flatten() for i in self.actor_critic.parameters()]).mean())
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gradient is None and model weights remain the same #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Gradient is None and model weights remain the same #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions