Thanks to visit codestin.com
Credit goes to github.com

Skip to content

SDLX Black output afer PR 55c2e05 #838

@pedroCabrera

Description

@pedroCabrera

Hi there, i found out after PR 55c2e05 my SDXL output are pure black, it is working fine in 52a97b3
Im runing a RTX4090 with -DSD_CUDA=ON

The same PR is working fine loading FLUX

WORKING LOG 52a97b3
sd.exe -m models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors --vae models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors -H 1024 -W 1024 -p "a lovely cat" -v
Option:
n_threads: 12
mode: img_gen
model_path: models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
clip_vision_path:
t5xxl_path:
diffusion_model_path:
high_noise_diffusion_model_path:
vae_path: models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors
taesd_path:
esrgan_path:
control_net_path:
embedding_dir:
stacked_id_embed_dir:
input_id_images_path:
style ratio: 20.00
normalize input image: false
output_path: output.png
init_image_path:
end_image_path:
mask_image_path:
control_image_path:
ref_images_paths:
control_video_path:
increase_ref_index: false
offload_params_to_cpu: false
clip_on_cpu: false
control_net_cpu: false
vae_on_cpu: false
diffusion flash attention: false
diffusion Conv2d direct: false
vae_conv_direct: false
control_strength: 0.90
prompt: a lovely cat
negative_prompt:
clip_skip: -1
width: 1024
height: 1024
sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: 20, eta: 0.00)
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00)
moe_boundary: 0.875
flow_shift: inf
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
chroma_use_dit_mask: true
chroma_use_t5_mask: false
chroma_t5_mask_pad: 1
video_frames: 1
vace_strength: 1.00
fps: 16
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:144 - Using CUDA backend
[INFO ] ggml_extend.hpp:65 - ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
[INFO ] ggml_extend.hpp:65 - ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[INFO ] ggml_extend.hpp:65 - ggml_cuda_init: found 1 CUDA devices:
[INFO ] ggml_extend.hpp:65 - Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:201 - loading model from 'models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors'
[INFO ] model.cpp:1038 - load models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors using safetensors format
[DEBUG] model.cpp:1145 - init from 'models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors', prefix = ''
[INFO ] stable-diffusion.cpp:255 - loading vae from 'models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors'
[INFO ] model.cpp:1038 - load models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors using safetensors format
[DEBUG] model.cpp:1145 - init from 'models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:267 - Version: SDXL
[INFO ] stable-diffusion.cpp:298 - Weight type: f32
[INFO ] stable-diffusion.cpp:299 - Conditioner weight type: f32
[INFO ] stable-diffusion.cpp:300 - Diffusion model weight type: f32
[INFO ] stable-diffusion.cpp:301 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:303 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1759 - clip params backend buffer size = 469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1759 - clip params backend buffer size = 2649.92 MB(VRAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1759 - unet params backend buffer size = 9113.19 MB(VRAM) (1680 tensors)
[DEBUG] ggml_extend.hpp:1759 - vae params backend buffer size = 94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:561 - loading weights
[DEBUG] model.cpp:1998 - loading tensors from models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors
|==================================================| 2641/2641 - 201.14it/s
[DEBUG] model.cpp:1998 - loading tensors from models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors
|==================================================| 2641/2641 - 7377.10it/s
[INFO ] model.cpp:2222 - loading tensors completed, taking 13.52s (process: 0.02s, read: 9.75s, memcpy: 0.00s, convert: 1.37s, copy_to_backend: 1.71s)
[INFO ] stable-diffusion.cpp:657 - total params memory size = 12327.02MB (VRAM 12327.02MB, RAM 0.00MB): text_encoders 3119.36MB(VRAM), diffusion_model 9113.19MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:710 - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:721 - finished loaded file
[DEBUG] stable-diffusion.cpp:2245 - generate_image 1024x1024
[INFO ] stable-diffusion.cpp:2366 - TXT2IMG
[INFO ] stable-diffusion.cpp:860 - attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:880 - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:881 - prompt after extract and remove lora: "a lovely cat"
[DEBUG] conditioner.hpp:345 - parse 'a lovely cat' to [['a lovely cat', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] conditioner.hpp:479 - computing condition graph completed, taking 102 ms
[DEBUG] conditioner.hpp:345 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] conditioner.hpp:479 - computing condition graph completed, taking 25 ms
[INFO ] stable-diffusion.cpp:2033 - get_learned_condition completed, taking 129 ms
[INFO ] stable-diffusion.cpp:2056 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:2105 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1583 - unet compute buffer size: 830.86 MB(VRAM)
|==================================================| 20/20 - 2.00it/s
[INFO ] stable-diffusion.cpp:2141 - sampling completed, taking 10.31s
[INFO ] stable-diffusion.cpp:2149 - generating 1 latent images completed, taking 10.36s
[INFO ] stable-diffusion.cpp:2152 - decoding 1 latents
[DEBUG] stable-diffusion.cpp:1475 - VAE Tile size: 32x32
[DEBUG] ggml_extend.hpp:1583 - vae compute buffer size: 6656.25 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1504 - computing vae decode graph completed, taking 0.60s
[INFO ] stable-diffusion.cpp:2162 - latent 1 decoded, taking 0.60s
[INFO ] stable-diffusion.cpp:2166 - decode_first_stage completed, taking 0.60s
[INFO ] stable-diffusion.cpp:2443 - generate_image completed in 11.10s
save result PNG image to 'output.png'

**NOT WORKING LOG 55c2e05 **

sd_n.exe -m models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors --vae models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors -H 1024 -W 1024 -p "a lovely cat" -v
Option:
n_threads: 12
mode: img_gen
model_path: models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
clip_vision_path:
t5xxl_path:
diffusion_model_path:
high_noise_diffusion_model_path:
vae_path: models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors
taesd_path:
esrgan_path:
control_net_path:
embedding_dir:
stacked_id_embed_dir:
input_id_images_path:
style ratio: 20.00
normalize input image: false
output_path: output.png
init_image_path:
end_image_path:
mask_image_path:
control_image_path:
ref_images_paths:
control_video_path:
increase_ref_index: false
offload_params_to_cpu: false
clip_on_cpu: false
control_net_cpu: false
vae_on_cpu: false
diffusion flash attention: false
diffusion Conv2d direct: false
vae_conv_direct: false
control_strength: 0.90
prompt: a lovely cat
negative_prompt:
clip_skip: -1
width: 1024
height: 1024
sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: 20, eta: 0.00)
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00)
moe_boundary: 0.875
flow_shift: inf
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
upscale_repeats: 1
chroma_use_dit_mask: true
chroma_use_t5_mask: false
chroma_t5_mask_pad: 1
video_frames: 1
vace_strength: 1.00
fps: 16
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:144 - Using CUDA backend
[INFO ] ggml_extend.hpp:65 - ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
[INFO ] ggml_extend.hpp:65 - ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[INFO ] ggml_extend.hpp:65 - ggml_cuda_init: found 1 CUDA devices:
[INFO ] ggml_extend.hpp:65 - Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:201 - loading model from 'models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors'
[INFO ] model.cpp:1043 - load models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors using safetensors format
[DEBUG] model.cpp:1150 - init from 'models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors', prefix = ''
[INFO ] stable-diffusion.cpp:255 - loading vae from 'models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors'
[INFO ] model.cpp:1043 - load models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors using safetensors format
[DEBUG] model.cpp:1150 - init from 'models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:267 - Version: SDXL
[INFO ] stable-diffusion.cpp:298 - Weight type: f32
[INFO ] stable-diffusion.cpp:299 - Conditioner weight type: f32
[INFO ] stable-diffusion.cpp:300 - Diffusion model weight type: f32
[INFO ] stable-diffusion.cpp:301 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:303 - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1759 - clip params backend buffer size = 469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1759 - clip params backend buffer size = 2649.92 MB(VRAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1759 - unet params backend buffer size = 9113.19 MB(VRAM) (1680 tensors)
[DEBUG] ggml_extend.hpp:1759 - vae params backend buffer size = 94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:561 - loading weights
[DEBUG] model.cpp:2030 - loading tensors from models\SDXL_TURBO\sd_xl_turbo_1.0.safetensors
|================================================> | 2557/2641 - 830.19it/s
[DEBUG] model.cpp:2030 - loading tensors from models\SDXL_TURBO\sdxl-vae-fp16-fix.safetensors
|==================================================| 2641/2641 - 801.03it/s
[INFO ] model.cpp:2274 - loading tensors completed, taking 3.31s (process: 0.01s, read: 2.42s, memcpy: 0.00s, convert: 0.07s, copy_to_backend: 0.51s)
[INFO ] stable-diffusion.cpp:657 - total params memory size = 12327.02MB (VRAM 12327.02MB, RAM 0.00MB): text_encoders 3119.36MB(VRAM), diffusion_model 9113.19MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:710 - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:721 - finished loaded file
[DEBUG] stable-diffusion.cpp:2245 - generate_image 1024x1024
[INFO ] stable-diffusion.cpp:2366 - TXT2IMG
[INFO ] stable-diffusion.cpp:860 - attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:880 - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:881 - prompt after extract and remove lora: "a lovely cat"
[DEBUG] conditioner.hpp:345 - parse 'a lovely cat' to [['a lovely cat', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] conditioner.hpp:479 - computing condition graph completed, taking 115 ms
[DEBUG] conditioner.hpp:345 - parse '' to [['', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] ggml_extend.hpp:1583 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] conditioner.hpp:479 - computing condition graph completed, taking 28 ms
[INFO ] stable-diffusion.cpp:2033 - get_learned_condition completed, taking 146 ms
[INFO ] stable-diffusion.cpp:2056 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:2105 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1583 - unet compute buffer size: 830.86 MB(VRAM)
|==================================================| 20/20 - 2.26it/s
[INFO ] stable-diffusion.cpp:2141 - sampling completed, taking 10.24s
[INFO ] stable-diffusion.cpp:2149 - generating 1 latent images completed, taking 10.29s
[INFO ] stable-diffusion.cpp:2152 - decoding 1 latents
[DEBUG] stable-diffusion.cpp:1475 - VAE Tile size: 32x32
[DEBUG] ggml_extend.hpp:1583 - vae compute buffer size: 6656.25 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1504 - computing vae decode graph completed, taking 0.57s
[INFO ] stable-diffusion.cpp:2162 - latent 1 decoded, taking 0.57s
[INFO ] stable-diffusion.cpp:2166 - decode_first_stage completed, taking 0.57s
[INFO ] stable-diffusion.cpp:2443 - generate_image completed in 11.01s
save result PNG image to 'output.png'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions