Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[SYCL] Fix Large Image Generation With SYCL Backend #380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 2, 2024

Conversation

zhentaoyu
Copy link
Contributor

Hi, this is a PR for improving the SYCL backend compatibility on Intel GPUs. We fixed and updated some sycl kernels to make sure stable-diffusion models could generate larger images (for example, 1024 x 1024).

Changes

  • update submodule ggml commit to latest 21d3a to sync SYCL kernels and avoid conflict with Vulkan backend PR at the same time.
  • turn off fast-math on the CPU if using SYCL backend. Otherwise, it will affect some host calculations (for example, start_merge_step )
  • update SYCL related readme and refine some messages in CMakeLists for good looking.

Results

test in Intel Data Center GPU Max 1100 with linux system.

SD series

  • SD2
    ./build/bin/sd -m ../sd_models/v2-1_768-nonema-pruned.safetensors -p "a lovely cat" -o "output_sd2_hw1024.png" -H 1024 -W 1024 -v
    total time: 30.67s
    image

  • SDXL
    ./build/bin/sd -m ../sd_models/sdxl/sd_xl_base_1.0.safetensors --vae ../sd_models/sdxl/sdxl_vae.safetensors -H 1024 -W 1024 -p "a lovely cat" -v -o "output_sdxl_hw1024.png" --seed 16
    total time: 35.24s
    image

  • SD3
    ./build/bin/sd -m ../sd_models/sd3_medium_incl_clips_t5xxlfp16.safetensors -H 1024 -W 1024 -p "a lovely cat holding a sign says \"Stable Diffusion CPP\"" --cfg-scale 4.5 --sampling-method euler -v -o "output_sd3_hw1024.png"
    total time: 46.91s

image

FLUX

  • FLUX-dev
    ./build/bin/sd --diffusion-model ../sd_models/flux/flux1-dev-q8_0.gguf --vae ../sd_models/flux/ae.safetensors --clip_l ../sd_models/flux/clip_l.safetensors --t5xxl ../sd_models/flux/t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v -H 1024 -W 1024
type q8_0 q4_0 q4_k q3_k q2_k
total time (s) 114.84 116.15 114.77 115.16 114.08
total mem (MB) 21481.50 15807.93 15808.58 14301.56 13149.14
result (1024x1024) image image image image image
  • FLUX-schnell
    ./build/bin/sd --diffusion-model ../sd_models/flux/flux1-schnell-q8_0.gguf --vae ../sd_models/flux/ae.safetensors --clip_l ../sd_models/flux/clip_l.safetensors --t5xxl ../sd_models/flux/t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v --steps 4 -o "output_flux_schnell.png" -H 1024 -W 1024
    total time: 29.61s
    image

Remain issues

  • diffusion model weights seem to have no change after lora was applied, causing the unexpected generation results in PhotoMaker and other Lora applications.
    photomaker: ./build/bin/sd -m ../sd_models/sdxl/sdxlUnstableDiffusers_v11.safetensors --vae ../sd_models/sdxl/sdxl_vae.safetensors --stacked-id-embd-dir ../sd_models/photo_maker/photomaker-v1.safetensors --input-id-images-dir ./assets/photomaker_examples/scarletthead_woman/ -p "a girl img, retro futurism, retro game art style but extremely beautiful, intricate details, masterpiece, best quality, space-themed, cosmic, celestial, stars, galaxies, nebulas, planets, science fiction, highly detailed" -n "realistic, photo-realistic, worst quality, greyscale, bad anatomy, bad hands, error, text" --cfg-scale 5.0 --sampling-method euler -H 1024 -W 1024 --style-ratio 15
    result:
    image
    It will take some time to debug and fix. Would put it in another PR.

cc @airMeng, @luoyu-intel, @hshen14

@zhentaoyu
Copy link
Contributor Author

Hi, @leejet, could you please take a look at this PR? Thanks a lot.

@leejet leejet merged commit e410aeb into leejet:master Sep 2, 2024
10 checks passed
@leejet
Copy link
Owner

leejet commented Sep 2, 2024

Thank you for your contribution.

@leejet leejet mentioned this pull request Sep 2, 2024
@zhentaoyu zhentaoyu deleted the fix_sycl_large_hw branch September 3, 2024 01:09
stduhpf pushed a commit to stduhpf/stable-diffusion.cpp that referenced this pull request Nov 1, 2024
…ejet#380)

* turn off fast-math on host in SYCL backend

Signed-off-by: zhentaoyu <[email protected]>

* update ggml for sync some sycl ops

Signed-off-by: zhentaoyu <[email protected]>

* update sycl readme and ggml

Signed-off-by: zhentaoyu <[email protected]>

---------

Signed-off-by: zhentaoyu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants