Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@gokyeongryeol
Copy link

@gokyeongryeol gokyeongryeol commented Dec 2, 2024

What does this PR do?

InstanceDiffusion: Instance-level Control for Image Generation (CVPR 2024)

note: The process of porting InstanceDiffusion to the diffusers library was largely based on the PRs about GLIGEN (link1, link2).

Fixes # (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yiyixuxu @asomoza

@gokyeongryeol
Copy link
Author

Hi! @yiyixuxu @asomoza @sayakpaul
Would you review this PR?

@sayakpaul
Copy link
Member

Could you also accompany the PR with a short description and perhaps some results?

@gokyeongryeol
Copy link
Author

@sayakpaul

GLIGEN uses a fuser called gated-self-attention between unet's self-attention and cross-attention to reflect extra conditions other than text prompts to the generated image. And the network that obtains the embedding that becomes the input of the fuser is different for each type of extra condition given.

InstanceDiffusion belongs to the layout2image model and uses object-specific positions and phrases as extra conditions. However, unlike GLIGEN, it proposed a network called Unifusion that can encompass all types of object-specific position information, no matter they are box, point, scribble, or mask.

In addition, it proposed a ScaleU block that scales the embedding in the up-block of unet based on Fourier transform so better handle small objects.

Furthermore, to prevent overlapping objects from being blurred during inference, it proposed multi-instance sampling, whereby denoising is performed independently for each object at the beginning, aggregation is performed at a certain point, and the remaining denoising steps are performed afterwards.

I validated the implementation following the config and example prompt from the official repository. The example usage with diffusers and the result can be checked in huggingface model card.

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Jan 13, 2025
@j-min
Copy link

j-min commented Feb 23, 2025

any updates on this?

@github-actions github-actions bot removed the stale Issues that haven't received updates label Feb 24, 2025
@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Issues that haven't received updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants