Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jaideepr97
Copy link
Member

removes the temporary cap placed on the transformers library and bumps training to >=v0.8.1

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the
    conventional commits.
  • Changelog updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Functional tests have been added, if necessary.
  • E2E Workflow tests have been added, if necessary.

@mergify mergify bot added the dependencies Relates to dependencies label Apr 8, 2025
@mergify mergify bot added the one-approval PR has one approval from a maintainer label Apr 8, 2025
@mergify mergify bot added ci-failure PR has at least one CI failure and removed one-approval PR has one approval from a maintainer labels Apr 8, 2025
@jaideepr97 jaideepr97 force-pushed the bump-transformers-training branch from 8f3e19e to 4b3cd03 Compare April 9, 2025 14:05
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 9, 2025
@booxter booxter self-requested a review April 9, 2025 20:23
Copy link
Contributor

@booxter booxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include transformers cap removal.

booxter
booxter previously requested changes Apr 9, 2025
Copy link
Contributor

@booxter booxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@mergify mergify bot added the one-approval PR has one approval from a maintainer label Apr 9, 2025
@jaideepr97 jaideepr97 force-pushed the bump-transformers-training branch from 4b3cd03 to b3beb76 Compare April 9, 2025 21:28
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Apr 9, 2025
@jaideepr97
Copy link
Member Author

@booxter updated and also removed the comment above it
GH UI is slow for some reason and not showing that yet

@mergify mergify bot added the ci-failure PR has at least one CI failure label Apr 9, 2025
@booxter
Copy link
Contributor

booxter commented Apr 9, 2025

More issues in training library with the transformers?..

  0%|          | 0/5 [00:00<?, ?it/s]/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/transformers/generation/configuration_utils.py:820: UserWarning: `return_dict_in_generate` is NOT set to `True`, but `output_logits` is. When `return_dict_in_generate` is not `True`, `output_logits` is ignored.
  warnings.warn(
'tuple' object has no attribute 'logits'

@RobotSail
Copy link
Member

@booxter It looks like the issue is the new transformers version changed their API. You can see in the current implementation of the full train pipeline, it's getting the output by calling return_dict=True, but that warning is expecting return_dict_in_generate=True (I'm guessing to be set in the initialization).

output = model(**batch, use_cache=False, return_dict=True)

So the solution here is to either change the logic to just read the model output as a tuple or rename the parameter.

Long-term, we need to just move this function to live in the training library instead so we aren't duplicating this logic.

@jaideepr97 jaideepr97 force-pushed the bump-transformers-training branch from a369140 to 842cb12 Compare April 10, 2025 15:08
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 10, 2025
@jaideepr97 jaideepr97 force-pushed the bump-transformers-training branch from 842cb12 to fb5691c Compare April 10, 2025 16:55
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 10, 2025
@jaideepr97 jaideepr97 force-pushed the bump-transformers-training branch from fb5691c to bfbe624 Compare April 10, 2025 17:50
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 10, 2025
@jaideepr97 jaideepr97 force-pushed the bump-transformers-training branch from bfbe624 to dc95760 Compare April 11, 2025 13:58
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Apr 11, 2025
@mergify mergify bot added the ci-failure PR has at least one CI failure label Apr 15, 2025
@jaideepr97 jaideepr97 force-pushed the bump-transformers-training branch from 5abdb89 to 9237ee4 Compare April 15, 2025 14:23
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 15, 2025
@jaideepr97 jaideepr97 force-pushed the bump-transformers-training branch from 9237ee4 to 0199d37 Compare April 15, 2025 14:57
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 15, 2025
@jaideepr97 jaideepr97 force-pushed the bump-transformers-training branch from 0199d37 to 19f7074 Compare April 15, 2025 16:33
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 15, 2025
@jaideepr97 jaideepr97 force-pushed the bump-transformers-training branch from 19f7074 to 8b19f09 Compare April 16, 2025 13:44
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Apr 16, 2025
@jaideepr97 jaideepr97 force-pushed the bump-transformers-training branch from 8b19f09 to af2995e Compare April 16, 2025 13:46
@jaideepr97 jaideepr97 force-pushed the bump-transformers-training branch 2 times, most recently from fe4f715 to a792286 Compare April 16, 2025 15:43
Currently full training relies on some APIs from instructlab/training which arent actually
helping with CPU legacy training. This commit resolves that by moving the necessary function
needed for loss correction when training with gradient accumulation to live inside of
the full train function, since it is only using a reduced amount of instructlab/instructlab
capability.

Signed-off-by: Oleg Silkin <[email protected]>
@jaideepr97 jaideepr97 force-pushed the bump-transformers-training branch from a792286 to 06e5e4a Compare April 16, 2025 15:48
student_model_arch = get_model_arch(pathlib.Path(params["model_path"]))
if ctx.obj.config.general.use_legacy_tmpl:
train_args.use_legacy_tmpl = True
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this change mean that legacy tokenizer config won't be detected anymore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@booxter We don't need it anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the change or we don't need support for legacy auto-detection? And is this change related to transformers cap removal / training dependency update?

loss = None
if isinstance(output, tuple):
loss = output[0]
if len(output[0].shape) != 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if len(loss.shape) != 0? (Or even if loss.shape?)

loss = output.loss
if loss is None:
raise ValueError(
"Loss is None. Ensure the model's output contains a valid loss."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These errors will bubble up to users through CLI error. I am not sure they have the context to interpret the errors meaningfully. As a user, how do I "ensure the model's output contains a valid loss"? Are you asking the user to check their model inputs / config perhaps? If so, this is what should be communicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to other errors here I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@booxter This shouldn't be happening, so if this happens then we want users to report this to us. They are also welcome to look at the internals of the CLI and potentially contribute back. We shouldn't assume that our users are incapable of being technical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So it's always a bug in the code and not the inputs from the user? If so, the suggestion to "ensure the model's output" seems misplaced and we may instead want to direct the user to report the issue. (And maybe dump some more info to include with the report?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a few things, the model implementation could have changed or the transformers API itself could have changed (as in the case here). So when that happens we just have this as a safeguard

@mergify mergify bot added the ci-failure PR has at least one CI failure label Apr 16, 2025
@booxter booxter requested review from booxter and removed request for booxter April 16, 2025 16:22
@booxter booxter dismissed their stale review April 16, 2025 16:29

I am removing the request for changes. I will let someone else with better knowledge to assess the merit.

@mergify mergify bot removed the ci-failure PR has at least one CI failure label Apr 16, 2025
@mergify mergify bot removed the one-approval PR has one approval from a maintainer label Apr 16, 2025
@booxter booxter removed their request for review April 16, 2025 19:37
@mergify mergify bot merged commit 1c6800d into instructlab:main Apr 16, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Relates to dependencies testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants