Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Gradient is None and model weights remain the same #8

@Charleshhy

Description

@Charleshhy

Hi,

Thanks for the awesome repo!

When trying to reproduce the results with bash scripts/gp_training/vl_train.sh, I checked the model's gradients and found that the gradients are None for all model parameters and the model weights do not seem to be updating. This happens for both distributed training and single-card training (as shown in the following lines in ppo.py). Can you help me troubleshoot this?

if self.accelerator.sync_gradients:

    self.accelerator.clip_grad_norm_(
        self.actor_critic.parameters(),
        self.max_grad_norm
    )

    import deepspeed
    no_grad = []
    has_grad = []
    for name, param in self.actor_critic.named_parameters():
        with deepspeed.zero.GatheredParameters(param, modifier_rank=None):
            if param.requires_grad:
                if param.grad is None:
                    no_grad.append(name)
                else:
                    has_grad.append(name)

    if len(no_grad) > 0:
        print("====No gradient: ", no_grad)
    if len(has_grad) > 0:
        print("====Has gradient: ", has_grad)

    print("====Average weight: ", torch.cat([i.flatten() for i in self.actor_critic.parameters()]).mean())

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions