chore: remove transformers cap and raise training floor #3264

jaideepr97 · 2025-04-08T20:31:19Z

removes the temporary cap placed on the transformers library and bumps training to >=v0.8.1

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the
conventional commits.
Changelog updated with breaking and/or notable changes for the next minor release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Functional tests have been added, if necessary.
E2E Workflow tests have been added, if necessary.

booxter

Please include transformers cap removal.

booxter

.

jaideepr97 · 2025-04-09T21:31:36Z

@booxter updated and also removed the comment above it
GH UI is slow for some reason and not showing that yet

booxter · 2025-04-09T22:53:40Z

More issues in training library with the transformers?..

  0%|          | 0/5 [00:00<?, ?it/s]/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/transformers/generation/configuration_utils.py:820: UserWarning: `return_dict_in_generate` is NOT set to `True`, but `output_logits` is. When `return_dict_in_generate` is not `True`, `output_logits` is ignored.
  warnings.warn(
'tuple' object has no attribute 'logits'

RobotSail · 2025-04-09T23:05:58Z

@booxter It looks like the issue is the new transformers version changed their API. You can see in the current implementation of the full train pipeline, it's getting the output by calling return_dict=True, but that warning is expecting return_dict_in_generate=True (I'm guessing to be set in the initialization).

instructlab/src/instructlab/model/full_train.py

Line 223 in 12b1217

output = model(**batch, use_cache=False, return_dict=True)

So the solution here is to either change the logic to just read the model output as a tuple or rename the parameter.

Long-term, we need to just move this function to live in the training library instead so we aren't duplicating this logic.

Signed-off-by: Jaideep Rao <[email protected]>

src/instructlab/train/linux_train.py

tests/test_package.py

src/instructlab/train/linux_train.py

src/instructlab/model/full_train.py

Currently full training relies on some APIs from instructlab/training which arent actually helping with CPU legacy training. This commit resolves that by moving the necessary function needed for loss correction when training with gradient accumulation to live inside of the full train function, since it is only using a reduced amount of instructlab/instructlab capability. Signed-off-by: Oleg Silkin <[email protected]>

src/instructlab/model/full_train.py

requirements.txt

requirements/cuda.txt

booxter · 2025-04-16T15:43:06Z

src/instructlab/configuration.py

-    student_model_arch = get_model_arch(pathlib.Path(params["model_path"]))
    if ctx.obj.config.general.use_legacy_tmpl:
        train_args.use_legacy_tmpl = True
-    else:


does this change mean that legacy tokenizer config won't be detected anymore?

@booxter We don't need it anymore

We don't need the change or we don't need support for legacy auto-detection? And is this change related to transformers cap removal / training dependency update?

booxter · 2025-04-16T15:48:09Z

src/instructlab/model/full_train.py

+            loss = None
+            if isinstance(output, tuple):
+                loss = output[0]
+                if len(output[0].shape) != 0:


nit: if len(loss.shape) != 0? (Or even if loss.shape?)

booxter · 2025-04-16T15:51:22Z

src/instructlab/model/full_train.py

+                loss = output.loss
+            if loss is None:
+                raise ValueError(
+                    "Loss is None. Ensure the model's output contains a valid loss."


These errors will bubble up to users through CLI error. I am not sure they have the context to interpret the errors meaningfully. As a user, how do I "ensure the model's output contains a valid loss"? Are you asking the user to check their model inputs / config perhaps? If so, this is what should be communicated.

This applies to other errors here I think?

@booxter This shouldn't be happening, so if this happens then we want users to report this to us. They are also welcome to look at the internals of the CLI and potentially contribute back. We shouldn't assume that our users are incapable of being technical.

I see. So it's always a bug in the code and not the inputs from the user? If so, the suggestion to "ensure the model's output" seems misplaced and we may instead want to direct the user to report the issue. (And maybe dump some more info to include with the report?)

It could be a few things, the model implementation could have changed or the transformers API itself could have changed (as in the case here). So when that happens we just have this as a safeguard

src/instructlab/model/full_train.py

I am removing the request for changes. I will let someone else with better knowledge to assess the merit.

mergify bot added the dependencies Relates to dependencies label Apr 8, 2025

booxter approved these changes Apr 8, 2025

View reviewed changes

mergify bot added the one-approval PR has one approval from a maintainer label Apr 8, 2025

RobotSail approved these changes Apr 8, 2025

View reviewed changes

mergify bot added ci-failure PR has at least one CI failure and removed one-approval PR has one approval from a maintainer labels Apr 8, 2025

jaideepr97 force-pushed the bump-transformers-training branch from 8f3e19e to 4b3cd03 Compare April 9, 2025 14:05

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 9, 2025

booxter self-requested a review April 9, 2025 20:23

booxter reviewed Apr 9, 2025

View reviewed changes

booxter previously requested changes Apr 9, 2025

View reviewed changes

mergify bot added the one-approval PR has one approval from a maintainer label Apr 9, 2025

jaideepr97 force-pushed the bump-transformers-training branch from 4b3cd03 to b3beb76 Compare April 9, 2025 21:28

mergify bot removed the ci-failure PR has at least one CI failure label Apr 9, 2025

mergify bot added the ci-failure PR has at least one CI failure label Apr 9, 2025

jaideepr97 force-pushed the bump-transformers-training branch from a369140 to 842cb12 Compare April 10, 2025 15:08

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 10, 2025

jaideepr97 force-pushed the bump-transformers-training branch from 842cb12 to fb5691c Compare April 10, 2025 16:55

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 10, 2025

jaideepr97 force-pushed the bump-transformers-training branch from fb5691c to bfbe624 Compare April 10, 2025 17:50

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 10, 2025

jaideepr97 force-pushed the bump-transformers-training branch from bfbe624 to dc95760 Compare April 11, 2025 13:58

mergify bot removed the ci-failure PR has at least one CI failure label Apr 11, 2025

mergify bot added the ci-failure PR has at least one CI failure label Apr 15, 2025

jaideepr97 force-pushed the bump-transformers-training branch from 5abdb89 to 9237ee4 Compare April 15, 2025 14:23

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 15, 2025

jaideepr97 force-pushed the bump-transformers-training branch from 9237ee4 to 0199d37 Compare April 15, 2025 14:57

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 15, 2025

jaideepr97 force-pushed the bump-transformers-training branch from 0199d37 to 19f7074 Compare April 15, 2025 16:33

mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 15, 2025

jaideepr97 force-pushed the bump-transformers-training branch from 19f7074 to 8b19f09 Compare April 16, 2025 13:44

mergify bot removed the ci-failure PR has at least one CI failure label Apr 16, 2025

remove transformers cap and raise training floor

7eec1ba

Signed-off-by: Jaideep Rao <[email protected]>

jaideepr97 force-pushed the bump-transformers-training branch from 8b19f09 to af2995e Compare April 16, 2025 13:46

booxter reviewed Apr 16, 2025

View reviewed changes

jaideepr97 force-pushed the bump-transformers-training branch 2 times, most recently from fe4f715 to a792286 Compare April 16, 2025 15:43

jaideepr97 force-pushed the bump-transformers-training branch from a792286 to 06e5e4a Compare April 16, 2025 15:48

booxter reviewed Apr 16, 2025

View reviewed changes

mergify bot added the ci-failure PR has at least one CI failure label Apr 16, 2025

booxter requested review from booxter and removed request for booxter April 16, 2025 16:22

mergify bot removed the ci-failure PR has at least one CI failure label Apr 16, 2025

cdoern approved these changes Apr 16, 2025

View reviewed changes

mergify bot removed the one-approval PR has one approval from a maintainer label Apr 16, 2025

booxter removed their request for review April 16, 2025 19:37

mergify bot merged commit 1c6800d into instructlab:main Apr 16, 2025
28 checks passed

chore: remove transformers cap and raise training floor #3264

chore: remove transformers cap and raise training floor #3264

Uh oh!

Conversation

jaideepr97 commented Apr 8, 2025

Uh oh!

booxter left a comment

Choose a reason for hiding this comment

Uh oh!

booxter left a comment

Choose a reason for hiding this comment

Uh oh!

jaideepr97 commented Apr 9, 2025

Uh oh!

booxter commented Apr 9, 2025

Uh oh!

RobotSail commented Apr 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants