Add --continue_on_fail flag to run_multi_gpu.py #114

psanal35 · 2025-09-12T17:51:58Z

Add --continue_on_fail flag, similar to run_single_gpu.py. It will be used in nightly runs.

psanal35 · 2025-09-12T18:43:52Z

I tested both cases locally,

gulsumgudukbay

Additionally, the pytest command should be modified to reflect the continue_on_fail logic.

if continue_on_fail is not set, -x should be used in the pytest command

psanal35 · 2025-09-12T18:57:05Z

Additionally, the pytest command should be modified to reflect the continue_on_fail logic.

Thank you for catching this, I did not realized pytest has its own flag. In this case, we can remove break in the loop?

gulsumgudukbay · 2025-09-12T19:02:58Z

Additionally, the pytest command should be modified to reflect the continue_on_fail logic.

Thank you for catching this, I did not realized pytest has its own flag. In this case, we can remove break in the loop?

Depends. So the loop traverses the test files. pytest's -x flag stops the test file execution if any unit test within a file fails. However the loop traverses the files and it stops **traversing the files ** after any test within a file fails.
So the -x argument is for within the file, however your break statement is for the whole loop that traverses the files.

Depending on the use case, if you want the loop to halt, you can keep the break statement. If you do not want the loop to halt but you want the test file execution to halt, you can remove the break statement.

…error

psanal35 · 2025-09-12T19:24:27Z

Additionally, the pytest command should be modified to reflect the continue_on_fail logic.

Thank you for catching this, I did not realized pytest has its own flag. In this case, we can remove break in the loop?

Depends. So the loop traverses the test files. pytest's -x flag stops the test file execution if any unit test within a file fails. However the loop traverses the files and it stops **traversing the files ** after any test within a file fails. So the -x argument is for within the file, however your break statement is for the whole loop that traverses the files.

Depending on the use case, if you want the loop to halt, you can keep the break statement. If you do not want the loop to halt but you want the test file execution to halt, you can remove the break statement.

I tested this and break seems redundant as it stands, but I am leaving it -- in any case it is correct to keep it~

gulsumgudukbay

I approve, but we will need another person to review as well since I pushed changes to this PR.

zahiqbal

Looks good,

Add --continue_on_fail flag to run_multi_gpu.py

e2df213

psanal35 force-pushed the mgpu-continue branch from 1859b56 to e2df213 Compare September 12, 2025 17:57

psanal35 requested review from charleshofer and gulsumgudukbay September 12, 2025 18:41

gulsumgudukbay requested changes Sep 12, 2025

View reviewed changes

Adding -x logic to pytest command

0dd3c65

if continue_on_fail is not set, -x should be used in the pytest command

Added continue_on_fail arg to run_multi_gpu_test function

79d18e0

gulsumgudukbay added 3 commits September 12, 2025 14:06

Fixed "parameter without a default follows parameter with a default" …

50ecc3d

…error

Fix linting

84ee83c

Fix linting

7989f10

psanal35 requested a review from gulsumgudukbay September 12, 2025 20:25

gulsumgudukbay approved these changes Sep 12, 2025

View reviewed changes

zahiqbal approved these changes Sep 12, 2025

View reviewed changes

psanal35 merged commit 4efb2a0 into master Sep 12, 2025
7 checks passed

psanal35 deleted the mgpu-continue branch September 12, 2025 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add --continue_on_fail flag to run_multi_gpu.py #114

Add --continue_on_fail flag to run_multi_gpu.py #114

Uh oh!

psanal35 commented Sep 12, 2025 •

edited

Loading

Uh oh!

psanal35 commented Sep 12, 2025

Uh oh!

gulsumgudukbay left a comment

Uh oh!

psanal35 commented Sep 12, 2025

Uh oh!

gulsumgudukbay commented Sep 12, 2025

Uh oh!

psanal35 commented Sep 12, 2025

Uh oh!

gulsumgudukbay left a comment •

edited

Loading

Uh oh!

zahiqbal left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add --continue_on_fail flag to run_multi_gpu.py #114

Add --continue_on_fail flag to run_multi_gpu.py #114

Uh oh!

Conversation

psanal35 commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psanal35 commented Sep 12, 2025

Uh oh!

gulsumgudukbay left a comment

Choose a reason for hiding this comment

Uh oh!

psanal35 commented Sep 12, 2025

Uh oh!

gulsumgudukbay commented Sep 12, 2025

Uh oh!

psanal35 commented Sep 12, 2025

Uh oh!

gulsumgudukbay left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zahiqbal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

psanal35 commented Sep 12, 2025 •

edited

Loading

gulsumgudukbay left a comment •

edited

Loading