-
Notifications
You must be signed in to change notification settings - Fork 759
Description
What happened?
Have a reproducer w/ Llama3.1_8b_fp16
, in which we receive the following error for a pretty long prompt (18,432 tokens) on mi300
:
EXEC @prefill_bs4
:0:rocdevice.cpp :2991: 218582848177 us: Callback: Queue 0x7f0810400000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
Aborted (core dumped)
I tried to run a tracy_capture
, but received the following error when attempting to:
Connecting to 127.0.0.1:8090...*** buffer overflow detected ***: terminated
Aborted (core dumped)
I tried w/ IREE at HEAD, along with using the 3.7.0rc20250908 version currently pinned in shark-ai
. I also tried pulling in the following fix: Don't inline immutable globals with non-util dialect attrs to see if it was a duplicate of #21946, but all threw the same error.
Had watch -n0 rocm-smi
running while invoking iree-run-module
, but did not see anything suspicious in terms of VRAM usage.
Steps to reproduce your issue
- Download mlir
az storage download \
--account-name sharkblobs \
--container-name stephen \
--name iree_0x29_error/mlir/llama_8b_fp16.mlir \
--file llama_8b_fp16.mlir \
--account-key <account_key>
- Download weights (if needed)
az storage download \
--account-name sharkblobs \
--container-name stephen \
--name iree_0x29_error/weights/llama.irpa \
--file llama.irpa \
--account-key <account_key>
- Download inputs
az storage download \
--account-name sharkblobs \
--container-name stephen \
--name iree_0x29_error/inputs/tokens.npy \
--file tokens.npy \
--account-key <account_key>
az storage download \
--account-name sharkblobs \
--container-name stephen \
--name iree_0x29_error/inputs/seq_lens.npy \
--file seq_lens.npy \
--account-key <account_key>
az storage download \
--account-name sharkblobs \
--container-name stephen \
--name iree_0x29_error/inputs/seq_block_ids.npy \
--file seq_block_ids.npy \
--account-key <account_key>
az storage download \
--account-name sharkblobs \
--container-name stephen \
--name iree_0x29_error/inputs/page_table.npy \
--file page_table.npy \
--account-key <account_key>
- Compile model:
iree-compile llama_8b_fp16.mlir \
-o llama_8b_fp16.vmfb \
--iree-hal-target-device=hip \
--iree-hip-target=gfx942 \
--iree-opt-level=O3 \
--iree-hal-indirect-command-buffers=true \
--iree-stream-resource-memory-model=discrete \
--iree-hal-memoization=true
- Invoke
iree-run-module
:
iree-run-module \
--module=llama_8b_fp16.vmfb \
--parameters=model=llama.irpa \
--device=hip://0 \
--function=prefill_bs4 \
--input=@./tokens.npy \
--input=@./seq_lens.npy \
--input=@./seq_block_ids.npy \
--input=@./page_table.npy
- See error:
EXEC @prefill_bs4
:0:rocdevice.cpp :2991: 218977493659 us: Callback: Queue 0x7feb10d00000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
Aborted (core dumped)
What component(s) does this issue relate to?
No response
Version information
Additional context
This seems to be the root cause of the following identified issue in shark-ai:
[Regression] Regression in Llama Prefill causing 0x29 error
Again, tried to collect a trace, but received the following error:
Connecting to 127.0.0.1:8090...*** buffer overflow detected ***: terminated
Aborted (core dumped)