Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix out kwarg shape check with ngroups swapped#4

Merged
WoosukKwon merged 1 commit into
vllm-project:mainfrom
Yard1:fix_out_with_ngroups_swapped
May 31, 2024
Merged

Fix out kwarg shape check with ngroups swapped#4
WoosukKwon merged 1 commit into
vllm-project:mainfrom
Yard1:fix_out_with_ngroups_swapped

Conversation

@Yard1
Copy link
Copy Markdown

@Yard1 Yard1 commented May 31, 2024

Missed this case

@WoosukKwon WoosukKwon merged commit e5da6e4 into vllm-project:main May 31, 2024
@Yard1 Yard1 deleted the fix_out_with_ngroups_swapped branch May 31, 2024 17:20
JongYeop-IDSLab added a commit to IDSLab-SKKU/IDS-flash-attention that referenced this pull request Jun 1, 2026
… race UNRESOLVED)

Attempted fixes vllm-project#2/vllm-project#3 for the long-context non-deterministic race in the d=128
QK CoFDA emulation: IntraWGOverlap=!UseQKEmu and UsePersistentScheduler=!UseQKEmu
(route the synchronous emu through the simpler non-overlapped / single-tile path).

BOTH INEFFECTIVE — the race persists with an identical pattern, so the cause is
NOT the overlap pipeline or the persistent scheduler. Working theory: a per-gemm
smem read race inside gemm_qk_cofda_emu vs the async TMA producer. Stopped per
systematic-debugging Iron Law (no fix vllm-project#4 without re-think).

Race scales with total emu QK-gemm calls (M_blocks x KV_blocks); bit-exact for
short context. Full handoff + repro:
vllm-mma/cair/experiments/docs/qk-emu-longcontext-race-debug.md

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants