[SYCL] Fix Large Image Generation With SYCL Backend #380
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, this is a PR for improving the SYCL backend compatibility on Intel GPUs. We fixed and updated some sycl kernels to make sure stable-diffusion models could generate larger images (for example,
1024 x 1024)
.Changes
ggml
commit to latest 21d3a to sync SYCL kernels and avoid conflict with Vulkan backend PR at the same time.fast-math
on the CPU if using SYCL backend. Otherwise, it will affect some host calculations (for example, start_merge_step )Results
test in
Intel Data Center GPU Max 1100
with linux system.SD series
SD2

./build/bin/sd -m ../sd_models/v2-1_768-nonema-pruned.safetensors -p "a lovely cat" -o "output_sd2_hw1024.png" -H 1024 -W 1024 -v
total time: 30.67s
SDXL

./build/bin/sd -m ../sd_models/sdxl/sd_xl_base_1.0.safetensors --vae ../sd_models/sdxl/sdxl_vae.safetensors -H 1024 -W 1024 -p "a lovely cat" -v -o "output_sdxl_hw1024.png" --seed 16
total time: 35.24s
SD3
./build/bin/sd -m ../sd_models/sd3_medium_incl_clips_t5xxlfp16.safetensors -H 1024 -W 1024 -p "a lovely cat holding a sign says \"Stable Diffusion CPP\"" --cfg-scale 4.5 --sampling-method euler -v -o "output_sd3_hw1024.png"
total time: 46.91s
FLUX
./build/bin/sd --diffusion-model ../sd_models/flux/flux1-dev-q8_0.gguf --vae ../sd_models/flux/ae.safetensors --clip_l ../sd_models/flux/clip_l.safetensors --t5xxl ../sd_models/flux/t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v -H 1024 -W 1024
./build/bin/sd --diffusion-model ../sd_models/flux/flux1-schnell-q8_0.gguf --vae ../sd_models/flux/ae.safetensors --clip_l ../sd_models/flux/clip_l.safetensors --t5xxl ../sd_models/flux/t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v --steps 4 -o "output_flux_schnell.png" -H 1024 -W 1024
total time: 29.61s
Remain issues
PhotoMaker
and other Lora applications.photomaker
:./build/bin/sd -m ../sd_models/sdxl/sdxlUnstableDiffusers_v11.safetensors --vae ../sd_models/sdxl/sdxl_vae.safetensors --stacked-id-embd-dir ../sd_models/photo_maker/photomaker-v1.safetensors --input-id-images-dir ./assets/photomaker_examples/scarletthead_woman/ -p "a girl img, retro futurism, retro game art style but extremely beautiful, intricate details, masterpiece, best quality, space-themed, cosmic, celestial, stars, galaxies, nebulas, planets, science fiction, highly detailed" -n "realistic, photo-realistic, worst quality, greyscale, bad anatomy, bad hands, error, text" --cfg-scale 5.0 --sampling-method euler -H 1024 -W 1024 --style-ratio 15
result:
It will take some time to debug and fix. Would put it in another PR.
cc @airMeng, @luoyu-intel, @hshen14