Codestin Search App

shjwudp · 2023-08-23T12:37:01Z

If not set a fit bucket size in the distributed optimizer, memory waste will result. Memory loss is sometimes high but invisible, for example, a 10GB memory penalty for each GPU on a misconfigured gpt-7b. I think reporting a warning when the bucket utilization is low is a solution, and I submitted my code as reference.

timmoon10

This would be a very useful check. I just have a suggestion to handle the case where the parameters are initialized in multiple stages, e.g. if the user uses init_params_bucket to manually configure buckets.

Co-authored-by: Tim Moon <[email protected]>

shjwudp · 2023-08-24T01:48:22Z

Hi Tim, It's great to see your reply! I appreciate your help in correcting it; thanks!

timmoon10

LGTM. Thanks!

shjwudp added 2 commits August 23, 2023 15:08

feat: Add the warning of distributed_fused_adam low bucket usage.

b00a8b6

correct unittest

1854b63

timmoon10 suggested changes Aug 23, 2023

View reviewed changes

Comment thread apex/contrib/optimizers/distributed_fused_adam.py Outdated

Update apex/contrib/optimizers/distributed_fused_adam.py

9194e0f

Co-authored-by: Tim Moon <[email protected]>

timmoon10 approved these changes Aug 24, 2023

View reviewed changes

crcrpar self-requested a review August 25, 2023 00:37

crcrpar approved these changes Aug 28, 2023

View reviewed changes

crcrpar merged commit 1d01e5c into NVIDIA:master Aug 28, 2023

crcrpar added the contrib label Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the warning of distributed_fused_adam low bucket usage#1714

Add the warning of distributed_fused_adam low bucket usage#1714
crcrpar merged 3 commits into
NVIDIA:masterfrom
shjwudp:memory_waste_alarm

shjwudp commented Aug 23, 2023 •

edited

Loading

Uh oh!

timmoon10 left a comment

Uh oh!

Uh oh!

shjwudp commented Aug 24, 2023

Uh oh!

timmoon10 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shjwudp commented Aug 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timmoon10 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shjwudp commented Aug 24, 2023

Uh oh!

timmoon10 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shjwudp commented Aug 23, 2023 •

edited

Loading