-
Notifications
You must be signed in to change notification settings - Fork 450
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
There is a bug within the newer transformers releases (>4.51.0) that results in a consistent failure within linux_train.py
The training process fails somewhere inside of the transformers library call to generate samples and provides no stack trace, other than saying that a tuple object is generated somewhere that does not contain any logits
Updating parameters to the function call does not seem to fix the problem
This issue only affects linux_train.py which is legacy code at this point
To Reproduce
Steps to reproduce the behavior:
- Try to execute simple train pipeline in a linux environment or run small-test job in CI
- hit error
Expected behavior
Screenshots
LINUX_TRAIN.PY: SANITY CHECKING THE BASE MODEL
after tokenizer call
before generate call
'tuple' object has no attribute 'logits'
0%| | 0/5 [00:00<?, ?it/s]
0%| | 0/5 [00:00<?, ?it/s]
+ rm -rf /home/tmp/tmp.O2La1qkAMT
Error: Process completed with exit code 1.
Device Info (please complete the following information):
- Hardware Specs: [e.g. Apple M2 Pro Chip, 16 GB Memory, etc.]
- OS Version: [e.g. Mac OS 14.4.1, Fedora Linux 40]
- Python Version: [output of
python --version] - InstructLab Version: [output of
ilab system info]
Additional context
ktdreyer and jpodivin
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working