-
Notifications
You must be signed in to change notification settings - Fork 450
fix: Logger can't format exception string #3135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
E2E (NVIDIA L40S x4) workflow launched on this PR: View run |
|
E2E (NVIDIA L40S x4) workflow launched on this PR: View run |
|
e2e workflow succeeded on this PR: View run, congrats! |
|
e2e workflow succeeded on this PR: View run, congrats! |
ktdreyer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried editing test_phased_train_failures() to catch this, and I discovered this broken logger.error() call has been copy-pasted in src/instructlab/model/accelerated_train.py, so it exists twice. Incidentally the unit test exercises the other one, not the one you've fixed here!
Here's my unit test patch, feel free to merge this into your PR here
git diff
diff --git a/tests/test_lab_train.py b/tests/test_lab_train.py
index 81a0922b..39aaa91f 100644
--- a/tests/test_lab_train.py
+++ b/tests/test_lab_train.py
@@ -652,7 +652,7 @@ class TestLabTrain:
run_training_patch.start()
result = run_default_phased_train(cli_runner)
run_training_patch.stop()
- assert TRAINING_FAILURE_MESSAGE in result.output
+ assert f"Failed during training loop: {TRAINING_FAILURE_MESSAGE}" in result.output
assert "Training Phase 1/2..." in result.output
assert result.exit_code == 1
fcf9a21 to
4197800
Compare
4197800 to
a460f7f
Compare
Thanks! Added!!! |
The logger can't format the args correctly due to incorrect formatting. To fix this, we can use "%s" format so that messages aren't calculated unless that logging level is active. Signed-off-by: Courtney Pacheco <[email protected]>
a460f7f to
fea21be
Compare
Issue resolved by this Pull Request:
Resolves #3136
Checklist:
conventional commits.
Overview
When an error occurs during SDG, the
logger.error()function can't format the caught exception message correctly due to the way we pass in the exception. Therefore, the true exception message can't be printed. Example:Job logs for the above screenshot.
Proposed Solution
To resolve this formatting issue, we can use f-strings instead of the approach we're using today. It's possible that one of the exceptions caught is a custom exception with unique attributes, etc. that cannot be parsed correctly.