Make SDG batch size configurable via system profile (backport #3157) #3208

mergify · 2025-02-28T17:58:13Z

Currently, the batch size for SDG is only configurable via the CLI, but a single batch size across all hardware profiles is not optimal. Different hardware configurations have varying capabilities, and using a fixed batch size can lead to under-utilization or over-utilization of resources during the SDG process.

To ensure efficient performance across different hardware, we should set the batch sizes independently in each system profile.

This is an automatic backport of pull request #3157 done by Mergify.

Signed-off-by: Nikhil Palaskar <[email protected]> (cherry picked from commit 4a309ce)

courtneypacheco · 2025-03-03T06:14:23Z

There is a failure in the large E2E job on main: https://github.com/instructlab/instructlab/actions/runs/13623238238/job/38076244547

I'm going to run the large E2E job on this branch to ensure no conflicts on this particular release branch.

github-actions · 2025-03-03T06:15:41Z

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

github-actions · 2025-03-03T06:44:25Z

e2e workflow failed on this PR: View run, please investigate.

courtneypacheco · 2025-03-03T06:49:54Z

The above job failed due to a download failure with HuggingFace. Rerunning.

github-actions · 2025-03-03T06:53:16Z

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

github-actions · 2025-03-03T07:23:19Z

e2e workflow failed on this PR: View run, please investigate.

courtneypacheco · 2025-03-03T11:51:23Z

The same HuggingFace download error occurred. It seems like a server-side error, so I will trigger the job once more.

github-actions · 2025-03-03T11:54:58Z

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

github-actions · 2025-03-03T14:30:51Z

e2e workflow succeeded on this PR: View run, congrats!

ktdreyer · 2025-03-03T15:47:32Z

Unit tests fail on this PR. Known failure, and we fixed this for release-v0.23 in #3210.

We're tracking the HuggingFace download error here: #3215 .

ktdreyer · 2025-03-03T16:52:56Z

Courtney & I discussed this today in a call.

Even though we fixed the unit tests in #3210, this PR does not reflect that change until we rebase on that. In the interest of time, we will not manually rebase this PR. We need to ship the upcoming v0.23.3 release today, and this is a low-risk change.

Summary: We will merge this PR over the failing unit tests. After it is merged, we'll verify that unit tests continue to pass on the release-v0.23 branch.

make SDG batch size configurable via system profile

bb2a7b6

Signed-off-by: Nikhil Palaskar <[email protected]> (cherry picked from commit 4a309ce)

mergify bot mentioned this pull request Feb 28, 2025

Make SDG batch size configurable via system profile #3157

Merged

mergify bot added testing Relates to testing release-branch Pull Request directly to a release branch ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Feb 28, 2025

courtneypacheco added the hold In-progress PR. Tag should be removed before merge. label Mar 3, 2025

mergify bot removed the ci-failure PR has at least one CI failure label Mar 3, 2025

mergify bot added the ci-failure PR has at least one CI failure label Mar 3, 2025

ktdreyer approved these changes Mar 3, 2025

View reviewed changes

mergify bot added the one-approval PR has one approval from a maintainer label Mar 3, 2025

courtneypacheco approved these changes Mar 3, 2025

View reviewed changes

courtneypacheco removed the hold In-progress PR. Tag should be removed before merge. label Mar 3, 2025

mergify bot removed the one-approval PR has one approval from a maintainer label Mar 3, 2025

ktdreyer merged commit 728a226 into release-v0.23 Mar 3, 2025
23 of 27 checks passed

ktdreyer deleted the mergify/bp/release-v0.23/pr-3157 branch March 3, 2025 16:53

ktdreyer mentioned this pull request Mar 3, 2025

Make SDG batch size configurable via system profile (backport #3157) #3207

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make SDG batch size configurable via system profile (backport #3157) #3208

Make SDG batch size configurable via system profile (backport #3157) #3208

Uh oh!

mergify bot commented Feb 28, 2025

Uh oh!

courtneypacheco commented Mar 3, 2025

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

courtneypacheco commented Mar 3, 2025

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

courtneypacheco commented Mar 3, 2025

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

ktdreyer commented Mar 3, 2025

Uh oh!

ktdreyer commented Mar 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Make SDG batch size configurable via system profile (backport #3157) #3208

Make SDG batch size configurable via system profile (backport #3157) #3208

Uh oh!

Conversation

mergify bot commented Feb 28, 2025

Uh oh!

courtneypacheco commented Mar 3, 2025

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

courtneypacheco commented Mar 3, 2025

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

courtneypacheco commented Mar 3, 2025

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

ktdreyer commented Mar 3, 2025

Uh oh!

ktdreyer commented Mar 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants