Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@farzadab
Copy link
Contributor

@farzadab farzadab commented Sep 10, 2024

This PR is mostly complete, but I do need to revisit some minor assumptions to make sure I didn't break the normal (non-fsdp) pipeline.

One of the odd changes here is the removal of .to(device, dtype).

  • The dtype part is moved into the model itself since it'll cost us too much to possibly be sloppy and load a full-precision 70B and then turn it into half precision.
  • The device part is being handled by the trainer itself when doing trainer.train. This causes some minor issues (e.g. running trainer.evaluate before is not allowed), but they're not hugely important.
    • note that now model.device is cpu (or mps) before trainer.train and then cuda:rank after trainer.train. This might take some getting used to.

@farzadab
Copy link
Contributor Author

@juberti are there any more comments?

@farzadab farzadab enabled auto-merge (squash) September 16, 2024 23:52
@farzadab farzadab merged commit be8ee6b into main Sep 16, 2024
1 check passed
@farzadab farzadab deleted the farzad-fsdp-p3 branch September 17, 2024 00:11
akshat0311 pushed a commit to jiviai/audio-llm that referenced this pull request Jan 30, 2025
* use_fsdp option

* return move to(device) when not using FSDP
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants