-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Add InstanceDiffusion implementation #10079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hi! @yiyixuxu @asomoza @sayakpaul |
|
Could you also accompany the PR with a short description and perhaps some results? |
|
GLIGEN uses a fuser called gated-self-attention between unet's self-attention and cross-attention to reflect extra conditions other than text prompts to the generated image. And the network that obtains the embedding that becomes the input of the fuser is different for each type of extra condition given. InstanceDiffusion belongs to the layout2image model and uses object-specific positions and phrases as extra conditions. However, unlike GLIGEN, it proposed a network called Unifusion that can encompass all types of object-specific position information, no matter they are box, point, scribble, or mask. In addition, it proposed a ScaleU block that scales the embedding in the up-block of unet based on Fourier transform so better handle small objects. Furthermore, to prevent overlapping objects from being blurred during inference, it proposed multi-instance sampling, whereby denoising is performed independently for each object at the beginning, aggregation is performed at a certain point, and the remaining denoising steps are performed afterwards. I validated the implementation following the config and example prompt from the official repository. The example usage with diffusers and the result can be checked in huggingface model card. |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
any updates on this? |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
InstanceDiffusion: Instance-level Control for Image Generation (CVPR 2024)
note: The process of porting InstanceDiffusion to the diffusers library was largely based on the PRs about GLIGEN (link1, link2).
Fixes # (issue)
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@yiyixuxu @asomoza