Thanks to visit codestin.com
Credit goes to github.com

Skip to content

0x29 Error on Llama_8b_fp16 #22050

@stbaione

Description

@stbaione

What happened?

Have a reproducer w/ Llama3.1_8b_fp16, in which we receive the following error for a pretty long prompt (18,432 tokens) on mi300:

EXEC @prefill_bs4
:0:rocdevice.cpp            :2991: 218582848177 us:  Callback: Queue 0x7f0810400000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
Aborted (core dumped)

I tried to run a tracy_capture, but received the following error when attempting to:

Connecting to 127.0.0.1:8090...*** buffer overflow detected ***: terminated
Aborted (core dumped)

I tried w/ IREE at HEAD, along with using the 3.7.0rc20250908 version currently pinned in shark-ai. I also tried pulling in the following fix: Don't inline immutable globals with non-util dialect attrs to see if it was a duplicate of #21946, but all threw the same error.

Had watch -n0 rocm-smi running while invoking iree-run-module, but did not see anything suspicious in terms of VRAM usage.

Steps to reproduce your issue

  1. Download mlir
az storage download \
    --account-name sharkblobs \
    --container-name stephen \
    --name iree_0x29_error/mlir/llama_8b_fp16.mlir \
    --file llama_8b_fp16.mlir \
    --account-key <account_key>
  1. Download weights (if needed)
az storage download \
    --account-name sharkblobs \
    --container-name stephen \
    --name iree_0x29_error/weights/llama.irpa \
    --file llama.irpa \
    --account-key <account_key>
  1. Download inputs
az storage download \
    --account-name sharkblobs \
    --container-name stephen \
    --name iree_0x29_error/inputs/tokens.npy \
    --file tokens.npy \
    --account-key <account_key>
az storage download \
    --account-name sharkblobs \
    --container-name stephen \
    --name iree_0x29_error/inputs/seq_lens.npy \
    --file seq_lens.npy \
    --account-key <account_key>
az storage download \
    --account-name sharkblobs \
    --container-name stephen \
    --name iree_0x29_error/inputs/seq_block_ids.npy \
    --file seq_block_ids.npy \
    --account-key <account_key>
az storage download \
    --account-name sharkblobs \
    --container-name stephen \
    --name iree_0x29_error/inputs/page_table.npy \
    --file page_table.npy \
    --account-key <account_key>
  1. Compile model:
iree-compile llama_8b_fp16.mlir \
    -o llama_8b_fp16.vmfb \
    --iree-hal-target-device=hip \
    --iree-hip-target=gfx942 \
    --iree-opt-level=O3  \
    --iree-hal-indirect-command-buffers=true  \
    --iree-stream-resource-memory-model=discrete  \
    --iree-hal-memoization=true
  1. Invoke iree-run-module:
iree-run-module \
    --module=llama_8b_fp16.vmfb \
    --parameters=model=llama.irpa \
    --device=hip://0 \
    --function=prefill_bs4 \
    --input=@./tokens.npy \
    --input=@./seq_lens.npy \
    --input=@./seq_block_ids.npy \
    --input=@./page_table.npy
  1. See error:
EXEC @prefill_bs4
:0:rocdevice.cpp            :2991: 218977493659 us:  Callback: Queue 0x7feb10d00000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
Aborted (core dumped)

What component(s) does this issue relate to?

No response

Version information

d588da3

Additional context

This seems to be the root cause of the following identified issue in shark-ai:

[Regression] Regression in Llama Prefill causing 0x29 error

Again, tried to collect a trace, but received the following error:

Connecting to 127.0.0.1:8090...*** buffer overflow detected ***: terminated
Aborted (core dumped)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug 🐞Something isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions