Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Bug]: Corrupted token outputs (??? / <unused>) on ROCm backend with gfx1152 target (Ryzen AI 350) #5579

@soulafein83

Description

@soulafein83

Description

When running inference with llama.cpp using the native ROCm (HIP) 7.14 backend (from tarball https://therock-nightly-tarball.s3.amazonaws.com) on a gfx1152 APU, the execution runs at full speed, but the model outputs completely corrupted tokens.

Environment

  • OS: CachyOS (Arch Linux-based)
  • Hardware: AMD Ryzen 7 AI 350 with Radeon 860M (gfx1152)
  • ROCm Version: 7.14.0a20260602 Nightly (built via rocm-gfx1152-bin)

Reproduction Steps

  1. Build and install ROCm 7.14 locally using the latest nightly tarball compiled for the gfx1152 architecture.
  2. Build llama.cpp from source with the gfx1152 target flag

Actual Behavior

  1. Qwen models: The output is a uniform, infinite stream of ? or ??? characters.

  2. Gemma models: The generator gets stuck in an endless loop outputting (or similar unused/special tokens).

Expected Behavior

The model should generate coherent text output, matching the results obtained via the Vulkan or CPU backends.

Metadata

Metadata

Assignees

Labels

status: assessedIndicates an issue has been root caused.

Type

No type
No fields configured for issues without a type.

Projects

Status
TODO

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions