-
Notifications
You must be signed in to change notification settings - Fork 377
Support for Flux #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There's a reference diffusers config in a PR as well: https://huggingface.co/black-forest-labs/FLUX.1-dev/discussions/3/files |
That's probably the best open model so far, but it's pretty big. I'd love to be able to use it quantized and with partial GPU offload. |
The "schnell" distillation seems like a good candidate though.
|
12B is massive, lower than f16 quants could become more popular, f8 or even q5 |
No, 12B parameters at fp32 is 12 * 4 bytes = 48GB memory. Not including clip/t5/vae/etc. |
Can SD.cpp partially offload a model to GPU? I was unable to do this. Can you give a hint hot to do this?
|
Can image generation models be quantized down to 4-5 bits? I saw it done to LLMs with mixed results, but never saw it working with SD and Co. |
stable-diffusion.cpp can quantize models using |
Different stuff. It's more like the LCM models on SD, it takes less steps to make a picture, with minor quality loss over the standard version.
There's already an unofficial FP8 version wich works on ComfyUI https://huggingface.co/Kijai/flux-fp8 But i don't know if comfy is able to do interference on more aggressive quantizations Anyway.... +1 for flux support. Or for any new big image model. As those model become bigger and bigger, running a quantized version could be the only way to run them on consumer-grade GPUs, and stable-diffusion.cpp could become a good solution. |
indeed flux is amazing, after working with it i wanna use it with cpp too if possible @FSSRepo if i remmeber correctly u were working on q2 and k variant, please work on it if u can, we can quantize it to something like 3-4 gb maybe |
this repo has q4 flux schnell: https://huggingface.co/city96/FLUX.1-schnell-gguf/tree/main. Agree would be great to get support, my computer is not even close to being able to run non quantized |
There are quantized files already (GGUF) claimed to have been made with stable-diffusion.cpp. https://huggingface.co/aifoundry-org/FLUX.1-schnell-Quantized/tree/main However, the ability to have a small REST/API like Llama.cpp would be amazing to host thins kind of models ! |
Indeed, in the last weeks lots of developers made several attempts for using a quantized version of flux in both ComfyUI and StableDiffusion webui Forge.
Both ComfyUI and StableDiffusion Webui (well, of course Forge too) already have an API, and except from some plugin wich try to integrate them with program like Photoshop or Krita, i didn't saw many projects using them. It can still be an interesting feature, but i don't think it should be the priority. |
The GGUF experiments aren't using proper ggml though. They are just using GGUF as a way to compress the weights, and they are dequantizing on the fly during inference, which is very inefficient. |
Hi all! I am an author of this repo Since SD3 is already supported and FLUX has close architecture (as far as I know at least), I hope it'll not be too complicated. |
Flux support has been added. #356 |
Although the architecture is similar to sd3, flux actually has a lot of additional things to implement, so adding flux support took me a bit longer. |
Hi there,thanks for the flux support. Has there been a noticeable difference in speed in your tests, in comparison with the compressed GGUF versions for other UI's? |
Nice! Finally able to run it, don't have enough VRAM so really appreciate it. Imagine probably take forever to generate one image, but it's something at least |
Support has been merge to master, so grab a latest release and give it a spin. |
is anyone else getting "error: unknown argument: --clip_1"? I'm using sd.exe for cuda12 and windows x64 EDIT: LOL it's --clip_l (lowercase l, not a 1) |
Ok sorry this is probably going to sound stupid, but does running the exe directly each time take longer than some other method? It seems like a lot of the steps do not pertain to processing the specific prompt so I was wondering if there is a way to keep the models “in memory” so to speak in between prompts so that after the first generation, other generations during the same session go faster because they are not trying to do all the steps from scratch each time? I don’t know if I’m making any sense. But I’m comparing to, say, koboldcpp, where all the loading of a llama model happens at first and once it’s loaded which takes some extra time, all generations after that are pretty quick. |
I don't think so, sadly. You can do multiple renders with the same prompt by adding the |
koboldcpp is a user program built on top of the library llama.cpp. You are looking for a user program built on top of this library, stable-diffusion.cpp. That's not the point of this repo, but someone could build this (or maybe already has?). |
kobold.cpp is a user program built on top of this library. |
I'm very sorry for my stupid question, but can someone explain why it's so slow when q8_0 or q4_k is used? About 18-19 sec per iteration, when fp8 model in ComfyUI was giving about 6-7 sec/it on the same GPU (RX 7600 XT)? |
Using stable-diffusion.cpp should be much faster than ComfyUI when it comes to GGUF. |
I thought that comment was related to Comfy+GGUF which I don't try, I tried Comfy with fp8 model. |
Ah I misunderstood what you meant. You're getting worse performance with stable-diffusion.cpp+GGUF compared to Comfy+fp8? Both using Rocm? |
Yes, on the same GPU. |
Weird. For me this is faster, but I'm comparing Vulkan to DirectML, not Rocm to Rocm. |
New diffusion model - https://blackforestlabs.ai/announcing-black-forest-labs/
Reference implementation: comfyanonymous/ComfyUI@1589b58
The text was updated successfully, but these errors were encountered: