Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Change router weight norm from in-place#70

Merged
mvpatel2000 merged 1 commit into
databricks:mainfrom
sashaDoubov:sasha/fix-router-weight-normalization
Dec 21, 2023
Merged

Change router weight norm from in-place#70
mvpatel2000 merged 1 commit into
databricks:mainfrom
sashaDoubov:sasha/fix-router-weight-normalization

Conversation

@sashaDoubov

Copy link
Copy Markdown
Contributor

I was seeing:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [8192, 2]], which is output 0 of DivBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

This change addresses that issue.

@mvpatel2000 mvpatel2000 merged commit 46567ec into databricks:main Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants