[SYCL] Fix Large Image Generation With SYCL Backend #380

zhentaoyu · 2024-08-29T06:43:46Z

Hi, this is a PR for improving the SYCL backend compatibility on Intel GPUs. We fixed and updated some sycl kernels to make sure stable-diffusion models could generate larger images (for example, 1024 x 1024).

Changes

update submodule ggml commit to latest 21d3a to sync SYCL kernels and avoid conflict with Vulkan backend PR at the same time.
turn off fast-math on the CPU if using SYCL backend. Otherwise, it will affect some host calculations (for example, start_merge_step )
update SYCL related readme and refine some messages in CMakeLists for good looking.

Results

test in Intel Data Center GPU Max 1100 with linux system.

SD series

SD2
./build/bin/sd -m ../sd_models/v2-1_768-nonema-pruned.safetensors -p "a lovely cat" -o "output_sd2_hw1024.png" -H 1024 -W 1024 -v
total time: 30.67s
SDXL
./build/bin/sd -m ../sd_models/sdxl/sd_xl_base_1.0.safetensors --vae ../sd_models/sdxl/sdxl_vae.safetensors -H 1024 -W 1024 -p "a lovely cat" -v -o "output_sdxl_hw1024.png" --seed 16
total time: 35.24s
SD3
./build/bin/sd -m ../sd_models/sd3_medium_incl_clips_t5xxlfp16.safetensors -H 1024 -W 1024 -p "a lovely cat holding a sign says \"Stable Diffusion CPP\"" --cfg-scale 4.5 --sampling-method euler -v -o "output_sd3_hw1024.png"
total time: 46.91s

FLUX

FLUX-dev
./build/bin/sd --diffusion-model ../sd_models/flux/flux1-dev-q8_0.gguf --vae ../sd_models/flux/ae.safetensors --clip_l ../sd_models/flux/clip_l.safetensors --t5xxl ../sd_models/flux/t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v -H 1024 -W 1024

type	q8_0	q4_0	q4_k	q3_k	q2_k
total time (s)	114.84	116.15	114.77	115.16	114.08
total mem (MB)	21481.50	15807.93	15808.58	14301.56	13149.14
result (1024x1024)

FLUX-schnell
./build/bin/sd --diffusion-model ../sd_models/flux/flux1-schnell-q8_0.gguf --vae ../sd_models/flux/ae.safetensors --clip_l ../sd_models/flux/clip_l.safetensors --t5xxl ../sd_models/flux/t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v --steps 4 -o "output_flux_schnell.png" -H 1024 -W 1024
total time: 29.61s

Remain issues

diffusion model weights seem to have no change after lora was applied, causing the unexpected generation results in PhotoMaker and other Lora applications.
photomaker: ./build/bin/sd -m ../sd_models/sdxl/sdxlUnstableDiffusers_v11.safetensors --vae ../sd_models/sdxl/sdxl_vae.safetensors --stacked-id-embd-dir ../sd_models/photo_maker/photomaker-v1.safetensors --input-id-images-dir ./assets/photomaker_examples/scarletthead_woman/ -p "a girl img, retro futurism, retro game art style but extremely beautiful, intricate details, masterpiece, best quality, space-themed, cosmic, celestial, stars, galaxies, nebulas, planets, science fiction, highly detailed" -n "realistic, photo-realistic, worst quality, greyscale, bad anatomy, bad hands, error, text" --cfg-scale 5.0 --sampling-method euler -H 1024 -W 1024 --style-ratio 15
result:

It will take some time to debug and fix. Would put it in another PR.

cc @airMeng, @luoyu-intel, @hshen14

Signed-off-by: zhentaoyu <[email protected]>

zhentaoyu · 2024-08-29T06:46:53Z

Hi, @leejet, could you please take a look at this PR? Thanks a lot.

leejet · 2024-09-02T14:30:06Z

Thank you for your contribution.

…ejet#380) * turn off fast-math on host in SYCL backend Signed-off-by: zhentaoyu <[email protected]> * update ggml for sync some sycl ops Signed-off-by: zhentaoyu <[email protected]> * update sycl readme and ggml Signed-off-by: zhentaoyu <[email protected]> --------- Signed-off-by: zhentaoyu <[email protected]>

zhentaoyu added 3 commits August 28, 2024 02:15

turn off fast-math on host in SYCL backend

b8059f4

Signed-off-by: zhentaoyu <[email protected]>

update ggml for sync some sycl ops

3a9c4da

Signed-off-by: zhentaoyu <[email protected]>

update sycl readme and ggml

ab6324b

Signed-off-by: zhentaoyu <[email protected]>

leejet merged commit e410aeb into leejet:master Sep 2, 2024
10 checks passed

leejet mentioned this pull request Sep 2, 2024

sync: update ggml #378

Closed

This was referenced Sep 2, 2024

Flux do not output anything #385

Open

Parameters changed for "ggml_flash_attn_ext" #384

Open

zhentaoyu deleted the fix_sycl_large_hw branch September 3, 2024 01:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Fix Large Image Generation With SYCL Backend #380

[SYCL] Fix Large Image Generation With SYCL Backend #380

zhentaoyu commented Aug 29, 2024

zhentaoyu commented Aug 29, 2024

leejet commented Sep 2, 2024

[SYCL] Fix Large Image Generation With SYCL Backend #380

[SYCL] Fix Large Image Generation With SYCL Backend #380

Conversation

zhentaoyu commented Aug 29, 2024

Changes

Results

SD series

FLUX

Remain issues

zhentaoyu commented Aug 29, 2024

leejet commented Sep 2, 2024