Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ktdreyer
Copy link
Contributor

@ktdreyer ktdreyer commented Apr 28, 2025

For instructlab, pip install . does not install vllm, but it does install an uncapped torch (2.7.0 currently).

When we install vllm later, we compile a binary flash_attn wheel against torch 2.7.0. vllm 0.8.4 requires torch==2.6.0, so we downgrade torch, and then we use that with the incompatible flash_attn binary wheel.

ImportError looks like:

/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

To resolve this, use constraints-dev.txt in the first pip install operation. This restricts torch to 2.6.0 immediately when we first install instructlab, so that we will compile flash_attn against that torch version.

For instructlab, "pip install ." does not install vllm, but it does
install an uncapped torch (2.7.0 currently).

When we install vllm later, we compile a binary flash_attn wheel against
torch 2.7.0. vllm 0.8.4 requires torch==2.6.0, so we downgrade torch,
and then we use that with the incompatible flash_attn binary wheel.

To resolve this, use constraints-dev.txt in the first pip install
operation. This restricts torch to 2.6.0 immediately when we first
install instructlab, so that we will compile flash_attn against that
torch version.

Signed-off-by: Ken Dreyer <[email protected]>
@mergify mergify bot added the CI/CD Affects CI/CD configuration label Apr 28, 2025
@github-actions
Copy link

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

@mergify mergify bot added the ci-failure PR has at least one CI failure label Apr 28, 2025
@mergify mergify bot added the one-approval PR has one approval from a maintainer label Apr 28, 2025
@github-actions
Copy link

e2e workflow failed on this PR: View run, please investigate.

@ktdreyer
Copy link
Contributor Author

This fixes the flash-attn problem. The e2e tests get further, but they still fail in NCCL timeouts. I've filed #3321 to track that separately.

@ktdreyer ktdreyer added this to the 0.26.0 milestone Apr 28, 2025
@mergify mergify bot removed the one-approval PR has one approval from a maintainer label Apr 28, 2025
@booxter booxter removed the request for review from courtneypacheco April 28, 2025 21:03
@booxter
Copy link
Contributor

booxter commented Apr 28, 2025

Force merging since Ken confirmed this improves situation even if it doesn't make CI green yet.

@booxter booxter merged commit f243048 into main Apr 28, 2025
24 of 27 checks passed
@courtneypacheco
Copy link
Contributor

@mergify backport release-v0.26

@mergify
Copy link
Contributor

mergify bot commented Apr 30, 2025

backport release-v0.26

✅ Backports have been created

Details

booxter added a commit that referenced this pull request Apr 30, 2025
…-3320

use `constraints-dev.txt` in e2e tests (backport #3320)
bbrowning added a commit to bbrowning/instructlab-sdg that referenced this pull request May 5, 2025
This is a port of instructlab/instructlab#3320
over to the SDG repository. While doing so, I noticed we also were not
using "-DGGML_CUDA=ON" so updated that as well, since it's the same
pip install line in the file.

Signed-off-by: Ben Browning <[email protected]>
mergify bot pushed a commit to instructlab/sdg that referenced this pull request May 15, 2025
This is a port of instructlab/instructlab#3320
over to the SDG repository. While doing so, I noticed we also were not
using "-DGGML_CUDA=ON" so updated that as well, since it's the same
pip install line in the file.

Signed-off-by: Ben Browning <[email protected]>
(cherry picked from commit 225612c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD Affects CI/CD configuration ci-failure PR has at least one CI failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants