Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@mergify
Copy link
Contributor

@mergify mergify bot commented Feb 28, 2025

Currently, the batch size for SDG is only configurable via the CLI, but a single batch size across all hardware profiles is not optimal. Different hardware configurations have varying capabilities, and using a fixed batch size can lead to under-utilization or over-utilization of resources during the SDG process.

To ensure efficient performance across different hardware, we should set the batch sizes independently in each system profile.


This is an automatic backport of pull request #3157 done by Mergify.

Signed-off-by: Nikhil Palaskar <[email protected]>
(cherry picked from commit 4a309ce)
@mergify mergify bot added testing Relates to testing release-branch Pull Request directly to a release branch ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Feb 28, 2025
@mergify mergify bot added the one-approval PR has one approval from a maintainer label Mar 2, 2025
@courtneypacheco courtneypacheco added the hold In-progress PR. Tag should be removed before merge. label Mar 3, 2025
@courtneypacheco
Copy link
Contributor

courtneypacheco commented Mar 3, 2025

There is a failure in the large E2E job on main: https://github.com/instructlab/instructlab/actions/runs/13623238238/job/38076244547

I'm going to run the large E2E job on this branch to ensure no conflicts on this particular release branch.

@github-actions
Copy link

github-actions bot commented Mar 3, 2025

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

@github-actions
Copy link

github-actions bot commented Mar 3, 2025

e2e workflow failed on this PR: View run, please investigate.

@courtneypacheco
Copy link
Contributor

That failure was due to a HuggingFace download error. Retrying.

@github-actions
Copy link

github-actions bot commented Mar 3, 2025

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

@github-actions
Copy link

github-actions bot commented Mar 3, 2025

e2e workflow failed on this PR: View run, please investigate.

@courtneypacheco
Copy link
Contributor

The same HuggingFace error occurred. It seems like a server-side error, so I'm going to retrigger the job once more.

@github-actions
Copy link

github-actions bot commented Mar 3, 2025

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

@github-actions
Copy link

github-actions bot commented Mar 3, 2025

e2e workflow failed on this PR: View run, please investigate.

@ktdreyer
Copy link
Contributor

ktdreyer commented Mar 3, 2025

We're tracking the HuggingFace download error here: #3215

@mergify mergify bot removed the one-approval PR has one approval from a maintainer label Mar 3, 2025
@ktdreyer
Copy link
Contributor

ktdreyer commented Mar 3, 2025

@courtneypacheco , I've merged this change to release-v0.23 in #3208 . As you mentioned in our call, we want to take this backport to release-v0.24 as well to avoid regressions.

I remember you mentioned you added the hold label to this PR, but I'm sorry I don't remember why you did that.

@ktdreyer
Copy link
Contributor

ktdreyer commented Mar 3, 2025

I discussed this with @courtneypacheco. She's ok with removing hold so that we merge this.

@ktdreyer ktdreyer removed the hold In-progress PR. Tag should be removed before merge. label Mar 3, 2025
@mergify mergify bot merged commit 8c31544 into release-v0.24 Mar 3, 2025
28 checks passed
@mergify mergify bot deleted the mergify/bp/release-v0.24/pr-3157 branch March 3, 2025 18:22
@ktdreyer
Copy link
Contributor

ktdreyer commented Mar 3, 2025

We've shipped this fix in https://github.com/instructlab/instructlab/releases/tag/v0.24.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-branch Pull Request directly to a release branch testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants