Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@courtneypacheco
Copy link
Contributor

@courtneypacheco courtneypacheco commented Feb 10, 2025

Issue resolved by this Pull Request:
Resolves #3136

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the
    conventional commits.
  • Changelog updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Functional tests have been added, if necessary.
  • E2E Workflow tests have been added, if necessary.

Overview

When an error occurs during SDG, the logger.error() function can't format the caught exception message correctly due to the way we pass in the exception. Therefore, the true exception message can't be printed. Example:

Screenshot 2025-02-10 at 10 31 06 AM

Job logs for the above screenshot.

Proposed Solution

To resolve this formatting issue, we can use f-strings instead of the approach we're using today. It's possible that one of the exceptions caught is a custom exception with unique attributes, etc. that cannot be parsed correctly.

@courtneypacheco courtneypacheco marked this pull request as ready for review February 10, 2025 15:32
@github-actions
Copy link

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

@github-actions
Copy link

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

@github-actions
Copy link

e2e workflow succeeded on this PR: View run, congrats!

@github-actions
Copy link

e2e workflow succeeded on this PR: View run, congrats!

Copy link
Contributor

@ktdreyer ktdreyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried editing test_phased_train_failures() to catch this, and I discovered this broken logger.error() call has been copy-pasted in src/instructlab/model/accelerated_train.py, so it exists twice. Incidentally the unit test exercises the other one, not the one you've fixed here!

Here's my unit test patch, feel free to merge this into your PR here

git diff
diff --git a/tests/test_lab_train.py b/tests/test_lab_train.py
index 81a0922b..39aaa91f 100644
--- a/tests/test_lab_train.py
+++ b/tests/test_lab_train.py
@@ -652,7 +652,7 @@ class TestLabTrain:
         run_training_patch.start()
         result = run_default_phased_train(cli_runner)
         run_training_patch.stop()
-        assert TRAINING_FAILURE_MESSAGE in result.output
+        assert f"Failed during training loop: {TRAINING_FAILURE_MESSAGE}" in result.output
         assert "Training Phase 1/2..." in result.output
         assert result.exit_code == 1
 

@mergify mergify bot added testing Relates to testing ci-failure PR has at least one CI failure labels Feb 10, 2025
@courtneypacheco
Copy link
Contributor Author

I tried editing test_phased_train_failures() to catch this, and I discovered this broken logger.error() call has been copy-pasted in src/instructlab/model/accelerated_train.py, so it exists twice. Incidentally the unit test exercises the other one, not the one you've fixed here!

Here's my unit test patch, feel free to merge this into your PR here

git diff
diff --git a/tests/test_lab_train.py b/tests/test_lab_train.py
index 81a0922b..39aaa91f 100644
--- a/tests/test_lab_train.py
+++ b/tests/test_lab_train.py
@@ -652,7 +652,7 @@ class TestLabTrain:
         run_training_patch.start()
         result = run_default_phased_train(cli_runner)
         run_training_patch.stop()
-        assert TRAINING_FAILURE_MESSAGE in result.output
+        assert f"Failed during training loop: {TRAINING_FAILURE_MESSAGE}" in result.output
         assert "Training Phase 1/2..." in result.output
         assert result.exit_code == 1
 

Thanks! Added!!!

@mergify mergify bot removed the ci-failure PR has at least one CI failure label Feb 10, 2025
The logger can't format the args correctly due to incorrect formatting. To fix this, we can use "%s" format so that messages aren't calculated unless that logging level is active.

Signed-off-by: Courtney Pacheco <[email protected]>
@mergify mergify bot added the one-approval PR has one approval from a maintainer label Feb 10, 2025
@danmcp danmcp removed the request for review from bbrowning February 10, 2025 22:51
@mergify mergify bot merged commit bc1c8bf into main Feb 11, 2025
28 checks passed
@mergify mergify bot removed the one-approval PR has one approval from a maintainer label Feb 11, 2025
@mergify mergify bot deleted the fix-logging-statement branch February 11, 2025 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Logger can't format exception string when SDG training fails

5 participants