Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@courtneypacheco
Copy link
Contributor

@courtneypacheco courtneypacheco commented Apr 10, 2025

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the
    conventional commits.
  • Changelog updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Functional tests have been added, if necessary.
  • E2E Workflow tests have been added, if necessary.

Changes

  • In requirements.txt: update vllm==0.7.3 with vllm>=0.8.0 and utilize a new constraints-dev.txt file to restrict the upper bound on vllm in the CI only
  • In requirements.txt: update torch>=2.3.0,<2.6.0 to torch>=2.6.0,<2.7.0, and set torch==2.6.0 in the new constraints-dev.txt file for the CI only

Note: vllm>=0.8.0,<0.8.0 is only compatible with torch>=2.6.0 right now, hence we have bumped the torch lower bound.

@mergify mergify bot added CI/CD Affects CI/CD configuration testing Relates to testing dependencies Relates to dependencies labels Apr 10, 2025
@mergify mergify bot added the ci-failure PR has at least one CI failure label Apr 10, 2025
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Apr 10, 2025
@courtneypacheco courtneypacheco changed the title chore: Bump torch + vllm versions feat: Bump torch + vllm versions Apr 10, 2025
@courtneypacheco courtneypacheco changed the title feat: Bump torch + vllm versions feat: Bump vllm to 0.8.3 Apr 10, 2025
@mergify mergify bot added the ci-failure PR has at least one CI failure label Apr 10, 2025
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Apr 11, 2025
@courtneypacheco courtneypacheco changed the title feat: Bump vllm to 0.8.3 feat(deps): Bump torch upper bound (<2.7.0) + set allowable range for vllm versions (>=0.8.0,<0.9.0) Apr 11, 2025
@mergify mergify bot added the ci-failure PR has at least one CI failure label Apr 11, 2025
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Apr 11, 2025
@mergify mergify bot added the ci-failure PR has at least one CI failure label Apr 11, 2025
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 11, 2025
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 11, 2025
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 11, 2025
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Apr 11, 2025
@mergify mergify bot added the ci-failure PR has at least one CI failure label Apr 14, 2025
@mergify mergify bot removed the one-approval PR has one approval from a maintainer label Apr 14, 2025
@github-actions
Copy link

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

@courtneypacheco
Copy link
Contributor Author

@RobotSail when the large E2E job passes, will we be okay to merge this change and release it? Or, would you like any additional testing performed to ensure that training works as intended?

@github-actions
Copy link

e2e workflow succeeded on this PR: View run, congrats!

@mergify mergify bot removed the ci-failure PR has at least one CI failure label Apr 14, 2025
@courtneypacheco courtneypacheco added the hold In-progress PR. Tag should be removed before merge. label Apr 14, 2025
@courtneypacheco
Copy link
Contributor Author

We need to validate this change in an E2E test with use_dolomite=True set. We also need to validate this against the Lama 70B model, which requires a new config.

@courtneypacheco
Copy link
Contributor Author

Please see: #3288

@ktdreyer
Copy link
Contributor

Should we merge #3288 instead of this?

@github-actions
Copy link

E2E (NVIDIA L40S x4) LLAMA workflow launched on this PR: View run

@github-actions
Copy link

e2e workflow failed on this PR: View run, please investigate.

@courtneypacheco courtneypacheco force-pushed the add-constraints branch 2 times, most recently from b84ed08 to a5db9e3 Compare April 16, 2025 17:54
@mergify mergify bot added the ci-failure PR has at least one CI failure label Apr 16, 2025
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Apr 16, 2025
@github-actions
Copy link

E2E (NVIDIA L40S x4) LLAMA workflow launched on this PR: View run

@github-actions
Copy link

e2e workflow failed on this PR: View run, please investigate.

@mergify mergify bot added the ci-failure PR has at least one CI failure label Apr 16, 2025
# vLLM only supports Linux platform (including WSL)
vllm==0.7.3 ; sys_platform == 'linux' and platform_machine == 'x86_64'
# vLLM only supports Linux platform (including WSL). Do not cap this dependency here. Cap in constraints-dev.txt
vllm>=0.8.0 ; sys_platform == 'linux' and platform_machine == 'x86_64'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're building for aarch64 now. Why is this limited to x86_64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Let me update this to remove the x86_64 limitation.

Add `constraints.txt` to restrict the CI to using torch==2.6.0 and vllm<0.9.0. Also bump the minimum `torch` version because vLLM 0.8.z is only compatible with `torch==2.6.0` (so far).

Also update the large Llama job to use fallback logic to try other availability zones.

Signed-off-by: Courtney Pacheco <[email protected]>
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Apr 17, 2025
@courtneypacheco
Copy link
Contributor Author

This PR did not auto-merge because the large llama E2E job failed. (It has a bug in it.) Therefore, Mergify will not auto-merge this PR as is because it detected a CI failure. Since the required CI checks all pass, I'm going to merge this PR. The llama large E2E job will be fixed in a future PR.

@courtneypacheco courtneypacheco merged commit 480c8f0 into main Apr 17, 2025
29 checks passed
@ktdreyer ktdreyer deleted the add-constraints branch April 17, 2025 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD Affects CI/CD configuration dependencies Relates to dependencies hold In-progress PR. Tag should be removed before merge. testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants