-
Notifications
You must be signed in to change notification settings - Fork 450
chore: remove transformers cap and raise training floor #3264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: remove transformers cap and raise training floor #3264
Conversation
8f3e19e to
4b3cd03
Compare
booxter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include transformers cap removal.
booxter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
4b3cd03 to
b3beb76
Compare
|
@booxter updated and also removed the comment above it |
|
More issues in training library with the transformers?.. |
|
@booxter It looks like the issue is the new transformers version changed their API. You can see in the current implementation of the full train pipeline, it's getting the output by calling
So the solution here is to either change the logic to just read the model output as a tuple or rename the parameter. Long-term, we need to just move this function to live in the training library instead so we aren't duplicating this logic. |
a369140 to
842cb12
Compare
842cb12 to
fb5691c
Compare
fb5691c to
bfbe624
Compare
bfbe624 to
dc95760
Compare
5abdb89 to
9237ee4
Compare
9237ee4 to
0199d37
Compare
0199d37 to
19f7074
Compare
19f7074 to
8b19f09
Compare
Signed-off-by: Jaideep Rao <[email protected]>
8b19f09 to
af2995e
Compare
fe4f715 to
a792286
Compare
Currently full training relies on some APIs from instructlab/training which arent actually helping with CPU legacy training. This commit resolves that by moving the necessary function needed for loss correction when training with gradient accumulation to live inside of the full train function, since it is only using a reduced amount of instructlab/instructlab capability. Signed-off-by: Oleg Silkin <[email protected]>
a792286 to
06e5e4a
Compare
| student_model_arch = get_model_arch(pathlib.Path(params["model_path"])) | ||
| if ctx.obj.config.general.use_legacy_tmpl: | ||
| train_args.use_legacy_tmpl = True | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this change mean that legacy tokenizer config won't be detected anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@booxter We don't need it anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the change or we don't need support for legacy auto-detection? And is this change related to transformers cap removal / training dependency update?
| loss = None | ||
| if isinstance(output, tuple): | ||
| loss = output[0] | ||
| if len(output[0].shape) != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if len(loss.shape) != 0? (Or even if loss.shape?)
| loss = output.loss | ||
| if loss is None: | ||
| raise ValueError( | ||
| "Loss is None. Ensure the model's output contains a valid loss." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These errors will bubble up to users through CLI error. I am not sure they have the context to interpret the errors meaningfully. As a user, how do I "ensure the model's output contains a valid loss"? Are you asking the user to check their model inputs / config perhaps? If so, this is what should be communicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This applies to other errors here I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@booxter This shouldn't be happening, so if this happens then we want users to report this to us. They are also welcome to look at the internals of the CLI and potentially contribute back. We shouldn't assume that our users are incapable of being technical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So it's always a bug in the code and not the inputs from the user? If so, the suggestion to "ensure the model's output" seems misplaced and we may instead want to direct the user to report the issue. (And maybe dump some more info to include with the report?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be a few things, the model implementation could have changed or the transformers API itself could have changed (as in the case here). So when that happens we just have this as a safeguard
I am removing the request for changes. I will let someone else with better knowledge to assess the merit.
removes the temporary cap placed on the transformers library and bumps training to >=v0.8.1
Checklist:
conventional commits.