Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@reidliu41
Copy link
Contributor

@reidliu41 reidliu41 commented Jan 16, 2025

Issue resolved by this Pull Request:
Resolves #2286

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the
    conventional commits.
  • Changelog updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Functional tests have been added, if necessary.
  • E2E Workflow tests have been added, if necessary.

@mergify mergify bot added the ci-failure PR has at least one CI failure label Jan 16, 2025
@reidliu41 reidliu41 force-pushed the add-check-for-tensor-parallel-size branch from 154563c to 5be764c Compare January 16, 2025 12:32
@mergify mergify bot added testing Relates to testing and removed ci-failure PR has at least one CI failure labels Jan 16, 2025
@reidliu41 reidliu41 force-pushed the add-check-for-tensor-parallel-size branch from 5be764c to 1f38055 Compare January 16, 2025 12:34
@reidliu41 reidliu41 force-pushed the add-check-for-tensor-parallel-size branch from 1f38055 to 44944b8 Compare January 17, 2025 02:11
@reidliu41 reidliu41 force-pushed the add-check-for-tensor-parallel-size branch from 44944b8 to fde2073 Compare January 17, 2025 10:15
@mergify mergify bot added the ci-failure PR has at least one CI failure label Jan 17, 2025
@reidliu41 reidliu41 force-pushed the add-check-for-tensor-parallel-size branch from fde2073 to c3a683d Compare January 17, 2025 13:02
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Jan 17, 2025
@reidliu41 reidliu41 force-pushed the add-check-for-tensor-parallel-size branch from c3a683d to d163f47 Compare January 17, 2025 15:26
@mergify mergify bot added the ci-failure PR has at least one CI failure label Jan 17, 2025
@reidliu41 reidliu41 force-pushed the add-check-for-tensor-parallel-size branch from d163f47 to efd02e2 Compare January 17, 2025 16:17
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Jan 17, 2025
@reidliu41 reidliu41 requested a review from bbrowning January 17, 2025 19:34
Copy link
Contributor

@bbrowning bbrowning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm not that familiar with this particular section of the code to comment too much on the overall flow and checking of valid values here, I did spot what looks like one logic error in check_gpu_count that I'm requesting changes on as I believe it will flag some valid gpu values as invalid if the user specifies to use all GPUs in their machine.

@reidliu41 reidliu41 force-pushed the add-check-for-tensor-parallel-size branch from efd02e2 to 6794510 Compare January 17, 2025 21:27
@reidliu41
Copy link
Contributor Author

hi @danmcp, any suggestions for the update?

@danmcp
Copy link
Member

danmcp commented Jan 22, 2025

hi @danmcp, any suggestions for the update?

No suggestions for changes but I think it's best to land this change after the upcoming release so it will get some usage and have time for any needed adjustments.

@nathan-weinberg nathan-weinberg added this to the 0.24.0 milestone Jan 22, 2025
@nathan-weinberg
Copy link
Member

Added to the 0.24.0 milestone

@reidliu41
Copy link
Contributor Author

hi team, can i get approve for this? thanks

@reidliu41 reidliu41 force-pushed the add-check-for-tensor-parallel-size branch from ab43517 to 9154e7e Compare February 3, 2025 14:43
Copy link
Contributor

@booxter booxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic is correct and the patch is great, but I think we need to keep click logic elsewhere. @alinaryan please confirm that's the plan.

@reidliu41 reidliu41 force-pushed the add-check-for-tensor-parallel-size branch from 9154e7e to a901104 Compare February 5, 2025 05:36
Copy link
Member

@RobotSail RobotSail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is in a pretty good spot, we should just adjust how we're presenting the logic to reduce cognitive load when reading the code.

@reidliu41 reidliu41 force-pushed the add-check-for-tensor-parallel-size branch 2 times, most recently from f21144d to 3a36793 Compare February 7, 2025 09:50
Copy link
Contributor

@booxter booxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is progress (we removed a tuple-return value), but there's a way to make it more clear.

@reidliu41 reidliu41 force-pushed the add-check-for-tensor-parallel-size branch from 2cba376 to c4aacd1 Compare February 7, 2025 15:01
@mergify mergify bot added the ci-failure PR has at least one CI failure label Feb 7, 2025
@reidliu41 reidliu41 force-pushed the add-check-for-tensor-parallel-size branch from c4aacd1 to 4dd5d85 Compare February 8, 2025 05:31
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Feb 8, 2025
@reidliu41 reidliu41 force-pushed the add-check-for-tensor-parallel-size branch from 4dd5d85 to 1408cdd Compare February 8, 2025 05:33
@mergify mergify bot added the one-approval PR has one approval from a maintainer label Feb 9, 2025
@reidliu41 reidliu41 requested review from a team and RobotSail and removed request for RobotSail and danmcp February 9, 2025 01:45
@mergify mergify bot merged commit c8c85db into instructlab:main Feb 16, 2025
28 checks passed
@mergify mergify bot removed the one-approval PR has one approval from a maintainer label Feb 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a sanity check of vllm tensor-parallel-size config before attempting to start vllm

6 participants