-
Notifications
You must be signed in to change notification settings - Fork 420
Description
Hi there, i found out after PR 55c2e05 my SDXL output are pure black, it is working fine in 52a97b3
Im runing a RTX4090 with -DSD_CUDA=ON
The same PR is working fine loading FLUX
WORKING LOG 52a97b3
sd.exe -m models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors --vae models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors -H 1024 -W 1024 -p "a lovely cat" -v
Option:
n_threads: 12
mode: img_gen
model_path: models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
clip_vision_path:
t5xxl_path:
diffusion_model_path:
high_noise_diffusion_model_path:
vae_path: models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors
taesd_path:
esrgan_path:
control_net_path:
embedding_dir:
stacked_id_embed_dir:
input_id_images_path:
style ratio: 20.00
normalize input image: false
output_path: output.png
init_image_path:
end_image_path:
mask_image_path:
control_image_path:
ref_images_paths:
control_video_path:
increase_ref_index: false
offload_params_to_cpu: false
clip_on_cpu: false
control_net_cpu: false
vae_on_cpu: false
diffusion flash attention: false
diffusion Conv2d direct: false
vae_conv_direct: false
control_strength: 0.90
prompt: a lovely cat
negative_prompt:
clip_skip: -1
width: 1024
height: 1024
sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: 20, eta: 0.00)
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00)
moe_boundary: 0.875
flow_shift: inf
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
chroma_use_dit_mask: true
chroma_use_t5_mask: false
chroma_t5_mask_pad: 1
video_frames: 1
vace_strength: 1.00
fps: 16
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:144 - Using CUDA backend
[INFO ] ggml_extend.hpp:65 - ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
[INFO ] ggml_extend.hpp:65 - ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[INFO ] ggml_extend.hpp:65 - ggml_cuda_init: found 1 CUDA devices:
[INFO ] ggml_extend.hpp:65 - Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:201 - loading model from 'models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors'
[INFO ] model.cpp:1038 - load models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors using safetensors format
[DEBUG] model.cpp:1145 - init from 'models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors', prefix = ''
[INFO ] stable-diffusion.cpp:255 - loading vae from 'models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors'
[INFO ] model.cpp:1038 - load models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors using safetensors format
[DEBUG] model.cpp:1145 - init from 'models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:267 - Version: SDXL
[INFO ] stable-diffusion.cpp:298 - Weight type: f32
[INFO ] stable-diffusion.cpp:299 - Conditioner weight type: f32
[INFO ] stable-diffusion.cpp:300 - Diffusion model weight type: f32
[INFO ] stable-diffusion.cpp:301 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:303 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1759 - clip params backend buffer size = 469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1759 - clip params backend buffer size = 2649.92 MB(VRAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1759 - unet params backend buffer size = 9113.19 MB(VRAM) (1680 tensors)
[DEBUG] ggml_extend.hpp:1759 - vae params backend buffer size = 94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:561 - loading weights
[DEBUG] model.cpp:1998 - loading tensors from models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors
|==================================================| 2641/2641 - 201.14it/s
[DEBUG] model.cpp:1998 - loading tensors from models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors
|==================================================| 2641/2641 - 7377.10it/s
[INFO ] model.cpp:2222 - loading tensors completed, taking 13.52s (process: 0.02s, read: 9.75s, memcpy: 0.00s, convert: 1.37s, copy_to_backend: 1.71s)
[INFO ] stable-diffusion.cpp:657 - total params memory size = 12327.02MB (VRAM 12327.02MB, RAM 0.00MB): text_encoders 3119.36MB(VRAM), diffusion_model 9113.19MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:710 - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:721 - finished loaded file
[DEBUG] stable-diffusion.cpp:2245 - generate_image 1024x1024
[INFO ] stable-diffusion.cpp:2366 - TXT2IMG
[INFO ] stable-diffusion.cpp:860 - attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:880 - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:881 - prompt after extract and remove lora: "a lovely cat"
[DEBUG] conditioner.hpp:345 - parse 'a lovely cat' to [['a lovely cat', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] conditioner.hpp:479 - computing condition graph completed, taking 102 ms
[DEBUG] conditioner.hpp:345 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] conditioner.hpp:479 - computing condition graph completed, taking 25 ms
[INFO ] stable-diffusion.cpp:2033 - get_learned_condition completed, taking 129 ms
[INFO ] stable-diffusion.cpp:2056 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:2105 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1583 - unet compute buffer size: 830.86 MB(VRAM)
|==================================================| 20/20 - 2.00it/s
[INFO ] stable-diffusion.cpp:2141 - sampling completed, taking 10.31s
[INFO ] stable-diffusion.cpp:2149 - generating 1 latent images completed, taking 10.36s
[INFO ] stable-diffusion.cpp:2152 - decoding 1 latents
[DEBUG] stable-diffusion.cpp:1475 - VAE Tile size: 32x32
[DEBUG] ggml_extend.hpp:1583 - vae compute buffer size: 6656.25 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1504 - computing vae decode graph completed, taking 0.60s
[INFO ] stable-diffusion.cpp:2162 - latent 1 decoded, taking 0.60s
[INFO ] stable-diffusion.cpp:2166 - decode_first_stage completed, taking 0.60s
[INFO ] stable-diffusion.cpp:2443 - generate_image completed in 11.10s
save result PNG image to 'output.png'
**NOT WORKING LOG 55c2e05 **
sd_n.exe -m models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors --vae models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors -H 1024 -W 1024 -p "a lovely cat" -v
Option:
n_threads: 12
mode: img_gen
model_path: models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
clip_vision_path:
t5xxl_path:
diffusion_model_path:
high_noise_diffusion_model_path:
vae_path: models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors
taesd_path:
esrgan_path:
control_net_path:
embedding_dir:
stacked_id_embed_dir:
input_id_images_path:
style ratio: 20.00
normalize input image: false
output_path: output.png
init_image_path:
end_image_path:
mask_image_path:
control_image_path:
ref_images_paths:
control_video_path:
increase_ref_index: false
offload_params_to_cpu: false
clip_on_cpu: false
control_net_cpu: false
vae_on_cpu: false
diffusion flash attention: false
diffusion Conv2d direct: false
vae_conv_direct: false
control_strength: 0.90
prompt: a lovely cat
negative_prompt:
clip_skip: -1
width: 1024
height: 1024
sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: 20, eta: 0.00)
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00)
moe_boundary: 0.875
flow_shift: inf
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
chroma_use_dit_mask: true
chroma_use_t5_mask: false
chroma_t5_mask_pad: 1
video_frames: 1
vace_strength: 1.00
fps: 16
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:144 - Using CUDA backend
[INFO ] ggml_extend.hpp:65 - ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
[INFO ] ggml_extend.hpp:65 - ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[INFO ] ggml_extend.hpp:65 - ggml_cuda_init: found 1 CUDA devices:
[INFO ] ggml_extend.hpp:65 - Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:201 - loading model from 'models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors'
[INFO ] model.cpp:1043 - load models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors using safetensors format
[DEBUG] model.cpp:1150 - init from 'models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors', prefix = ''
[INFO ] stable-diffusion.cpp:255 - loading vae from 'models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors'
[INFO ] model.cpp:1043 - load models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors using safetensors format
[DEBUG] model.cpp:1150 - init from 'models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:267 - Version: SDXL
[INFO ] stable-diffusion.cpp:298 - Weight type: f32
[INFO ] stable-diffusion.cpp:299 - Conditioner weight type: f32
[INFO ] stable-diffusion.cpp:300 - Diffusion model weight type: f32
[INFO ] stable-diffusion.cpp:301 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:303 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1759 - clip params backend buffer size = 469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1759 - clip params backend buffer size = 2649.92 MB(VRAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1759 - unet params backend buffer size = 9113.19 MB(VRAM) (1680 tensors)
[DEBUG] ggml_extend.hpp:1759 - vae params backend buffer size = 94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:561 - loading weights
[DEBUG] model.cpp:2030 - loading tensors from models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors
|================================================> | 2557/2641 - 830.19it/s
[DEBUG] model.cpp:2030 - loading tensors from models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors
|==================================================| 2641/2641 - 801.03it/s
[INFO ] model.cpp:2274 - loading tensors completed, taking 3.31s (process: 0.01s, read: 2.42s, memcpy: 0.00s, convert: 0.07s, copy_to_backend: 0.51s)
[INFO ] stable-diffusion.cpp:657 - total params memory size = 12327.02MB (VRAM 12327.02MB, RAM 0.00MB): text_encoders 3119.36MB(VRAM), diffusion_model 9113.19MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:710 - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:721 - finished loaded file
[DEBUG] stable-diffusion.cpp:2245 - generate_image 1024x1024
[INFO ] stable-diffusion.cpp:2366 - TXT2IMG
[INFO ] stable-diffusion.cpp:860 - attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:880 - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:881 - prompt after extract and remove lora: "a lovely cat"
[DEBUG] conditioner.hpp:345 - parse 'a lovely cat' to [['a lovely cat', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] conditioner.hpp:479 - computing condition graph completed, taking 115 ms
[DEBUG] conditioner.hpp:345 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] conditioner.hpp:479 - computing condition graph completed, taking 28 ms
[INFO ] stable-diffusion.cpp:2033 - get_learned_condition completed, taking 146 ms
[INFO ] stable-diffusion.cpp:2056 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:2105 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1583 - unet compute buffer size: 830.86 MB(VRAM)
|==================================================| 20/20 - 2.26it/s
[INFO ] stable-diffusion.cpp:2141 - sampling completed, taking 10.24s
[INFO ] stable-diffusion.cpp:2149 - generating 1 latent images completed, taking 10.29s
[INFO ] stable-diffusion.cpp:2152 - decoding 1 latents
[DEBUG] stable-diffusion.cpp:1475 - VAE Tile size: 32x32
[DEBUG] ggml_extend.hpp:1583 - vae compute buffer size: 6656.25 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1504 - computing vae decode graph completed, taking 0.57s
[INFO ] stable-diffusion.cpp:2162 - latent 1 decoded, taking 0.57s
[INFO ] stable-diffusion.cpp:2166 - decode_first_stage completed, taking 0.57s
[INFO ] stable-diffusion.cpp:2443 - generate_image completed in 11.01s
save result PNG image to 'output.png'