Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Vulkan on AMD Ryzen AI APU/iGPU generates worse images than CPU, or just colorful noise #563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lostdisc opened this issue Jan 10, 2025 · 5 comments

Comments

@lostdisc
Copy link

When I run stable-diffusion.cpp with Vulkan on a Ryzen AI 9 HX 370 (Radeon 890M iGPU), the resulting images are very different from what I get when running on CPU with the AVX2 build. Some comparison pics follow.

SDXL

For reference, the below pic is what I get from SDXL on my CPU if I prompt as follows:
sd -m sd_xl_base_1.0.safetensors --vae sdxl.vae.safetensors -H 1024 -W 1024 -p "a lovely cat"
(Note that I needed to use madebyollin's fp16 vae to get an output that isn't all black.)
sd-cpp-avx2_vae-fp16_cat_1024x1024_output

And below is what I get from SDXL on my GPU using Vulkan:
sd -m sd_xl_base_1.0.safetensors --vae sdxl.vae.safetensors --vae-on-cpu -H 1024 -W 1024 -p "a lovely cat"
(Note that running VAE on the CPU versus tiled on the GPU produces essentially the same-looking image below. Attempting to run on GPU without tiling fails when it requests an excessive amount of memory, as described in stduhpf's comment here.)
sd-cpp-vulkan_vae-fp16-on-cpu_cat_1024x1024_output

SD 1.5

With SD 1.5, Vulkan at least produces actual cat pictures, but they are blurry or deformed compared to CPU.

For reference, below is what I get from the CPU for the following prompt:
sd -m v1-5-pruned-emaonly.safetensors -p "a lovely cat"
sd-cpp-avx2_cat_output

And below is what I get from the GPU with Vulkan:
sd -m v1-5-pruned-emaonly.safetensors -p "a lovely cat"
(I also tried running this with the VAE on the CPU, but it gives the same cat below with no apparent visual difference.)
sd-cpp-vulkan_cat_output

Finally, running clip on the CPU gives a different, more-deformed cat:
sd -m v1-5-pruned-emaonly.safetensors --vae-on-cpu --clip-on-cpu -p "a lovely cat"
sd-cpp-vulkan_vae+clip-on-cpu_cat_output

@zhycheng614
Copy link

zhycheng614 commented Feb 26, 2025

The same issue, with flux, also noise image:

Image

My CPU and GPU:
Ryzen AI 9 HX PRO 375 (Radeon 890M iGPU)

cfg_scale: 1
steps; 4

@stduhpf
Copy link
Contributor

stduhpf commented Feb 26, 2025

@lostdisc @zhycheng614 Does llama.cpp vulkan work on your systems? If yes, you can try with PR #509 to see if this fixes it. Otherwise, I think it might be a driver issue, and you should report this to AMD.

@zhycheng614
Copy link

@lostdisc @zhycheng614 Does llama.cpp vulkan work on your systems? If yes, you can try with PR #509 to see if this fixes it. Otherwise, I think it might be a driver issue, and you should report this to AMD.

  1. Yes, llama.cpp vulkan works on my system, can perform correct inference.
  2. On Apple's M1 chip with metal, the same problem: image with noise.
  3. On Apple's M3 Pro chip with metal, can work very well.
  4. On AMD CPU, works very well, high quality image.

@stduhpf
Copy link
Contributor

stduhpf commented Feb 26, 2025

@lostdisc @zhycheng614 Does llama.cpp vulkan work on your systems? If yes, you can try with PR #509 to see if this fixes it. Otherwise, I think it might be a driver issue, and you should report this to AMD.

  1. Yes, llama.cpp vulkan works on my system, can perform correct inference.
  2. On Apple's M1 chip with metal, the same problem: image with noise.
  3. On Apple's M3 Pro chip with metal, can work very well.
  4. On AMD CPU, works very well, high quality image.

If PR #509 doesn't fix it, could you try to run test-backend-ops (from llama.cpp)? Maybe some specific OPs are not working properly....

@lostdisc
Copy link
Author

lostdisc commented Mar 9, 2025

Just noticed that you guys synced ggml last week, which is what I had been waiting for 😄. Now SDXL on Vulkan produces a proper cat that's very similar to the CPU version (albeit not identical):

Image

In the meantime, I had been messing with converting models to onnx. Sd-cpp on Vulkan runs slower/hotter, but is much less RAM-constrained, letting me exceed 1024x1024. And it sure beats running on CPU!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants