Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add the warning of distributed_fused_adam low bucket usage#1714

Merged
crcrpar merged 3 commits into
NVIDIA:masterfrom
shjwudp:memory_waste_alarm
Aug 28, 2023
Merged

Add the warning of distributed_fused_adam low bucket usage#1714
crcrpar merged 3 commits into
NVIDIA:masterfrom
shjwudp:memory_waste_alarm

Conversation

@shjwudp
Copy link
Copy Markdown
Contributor

@shjwudp shjwudp commented Aug 23, 2023

If not set a fit bucket size in the distributed optimizer, memory waste will result. Memory loss is sometimes high but invisible, for example, a 10GB memory penalty for each GPU on a misconfigured gpt-7b. I think reporting a warning when the bucket utilization is low is a solution, and I submitted my code as reference.

Copy link
Copy Markdown
Member

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a very useful check. I just have a suggestion to handle the case where the parameters are initialized in multiple stages, e.g. if the user uses init_params_bucket to manually configure buckets.

Comment thread apex/contrib/optimizers/distributed_fused_adam.py Outdated
@shjwudp
Copy link
Copy Markdown
Contributor Author

shjwudp commented Aug 24, 2023

Hi Tim, It's great to see your reply! I appreciate your help in correcting it; thanks!

Copy link
Copy Markdown
Member

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@crcrpar crcrpar self-requested a review August 25, 2023 00:37
@crcrpar crcrpar merged commit 1d01e5c into NVIDIA:master Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants