Description
When running inference with llama.cpp using the native ROCm (HIP) 7.14 backend (from tarball https://therock-nightly-tarball.s3.amazonaws.com) on a gfx1152 APU, the execution runs at full speed, but the model outputs completely corrupted tokens.
Environment
- OS: CachyOS (Arch Linux-based)
- Hardware: AMD Ryzen 7 AI 350 with Radeon 860M (
gfx1152)
- ROCm Version: 7.14.0a20260602 Nightly (built via
rocm-gfx1152-bin)
Reproduction Steps
- Build and install ROCm 7.14 locally using the latest nightly tarball compiled for the
gfx1152 architecture.
- Build
llama.cpp from source with the gfx1152 target flag
Actual Behavior
-
Qwen models: The output is a uniform, infinite stream of ? or ??? characters.
-
Gemma models: The generator gets stuck in an endless loop outputting (or similar unused/special tokens).
Expected Behavior
The model should generate coherent text output, matching the results obtained via the Vulkan or CPU backends.
Description
When running inference with
llama.cppusing the native ROCm (HIP) 7.14 backend (from tarball https://therock-nightly-tarball.s3.amazonaws.com) on agfx1152APU, the execution runs at full speed, but the model outputs completely corrupted tokens.Environment
gfx1152)rocm-gfx1152-bin)Reproduction Steps
gfx1152architecture.llama.cppfrom source with thegfx1152target flagActual Behavior
Qwen models: The output is a uniform, infinite stream of ? or ??? characters.
Gemma models: The generator gets stuck in an endless loop outputting (or similar unused/special tokens).
Expected Behavior
The model should generate coherent text output, matching the results obtained via the Vulkan or CPU backends.