Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@mergify
Copy link
Contributor

@mergify mergify bot commented Apr 30, 2025

For instructlab, pip install . does not install vllm, but it does install an uncapped torch (2.7.0 currently).

When we install vllm later, we compile a binary flash_attn wheel against torch 2.7.0. vllm 0.8.4 requires torch==2.6.0, so we downgrade torch, and then we use that with the incompatible flash_attn binary wheel.

ImportError looks like:

/actions-runner/_work/instructlab/instructlab/venv/lib64/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

To resolve this, use constraints-dev.txt in the first pip install operation. This restricts torch to 2.6.0 immediately when we first install instructlab, so that we will compile flash_attn against that torch version.


This is an automatic backport of pull request #3320 done by Mergify.

For instructlab, "pip install ." does not install vllm, but it does
install an uncapped torch (2.7.0 currently).

When we install vllm later, we compile a binary flash_attn wheel against
torch 2.7.0. vllm 0.8.4 requires torch==2.6.0, so we downgrade torch,
and then we use that with the incompatible flash_attn binary wheel.

To resolve this, use constraints-dev.txt in the first pip install
operation. This restricts torch to 2.6.0 immediately when we first
install instructlab, so that we will compile flash_attn against that
torch version.

Signed-off-by: Ken Dreyer <[email protected]>
(cherry picked from commit 8a11c90)

# Conflicts:
#	.github/workflows/e2e-nvidia-l40s-x4-py312.yml
@mergify mergify bot added the conflicts This Pull Request has merge conflicts label Apr 30, 2025
@mergify
Copy link
Contributor Author

mergify bot commented Apr 30, 2025

Cherry-pick of 8a11c90 has failed:

On branch mergify/bp/release-v0.26/pr-3320
Your branch is up to date with 'origin/release-v0.26'.

You are currently cherry-picking commit 8a11c90.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   .github/workflows/e2e-aws-custom.yml
	modified:   .github/workflows/e2e-nvidia-l4-x1.yml
	modified:   .github/workflows/e2e-nvidia-l40s-x4-llama.yml
	modified:   .github/workflows/e2e-nvidia-l40s-x4.yml
	modified:   .github/workflows/e2e-nvidia-l40s-x8.yml
	modified:   .github/workflows/e2e-nvidia-t4-x1.yml

Unmerged paths:
  (use "git add/rm <file>..." as appropriate to mark resolution)
	deleted by us:   .github/workflows/e2e-nvidia-l40s-x4-py312.yml

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@mergify mergify bot added CI/CD Affects CI/CD configuration release-branch Pull Request directly to a release branch labels Apr 30, 2025
@mergify mergify bot added one-approval PR has one approval from a maintainer ci-failure PR has at least one CI failure labels Apr 30, 2025
Copy link
Contributor

@booxter booxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure is for what this patch is fixing (and since workflows from the PR are not used in the run, we'll have to merge it ignoring the failure).

@booxter booxter merged commit 0f79cfe into release-v0.26 Apr 30, 2025
22 of 24 checks passed
@mergify mergify bot removed the one-approval PR has one approval from a maintainer label Apr 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD Affects CI/CD configuration ci-failure PR has at least one CI failure conflicts This Pull Request has merge conflicts release-branch Pull Request directly to a release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants