Add InstanceDiffusion implementation #10079

gokyeongryeol · 2024-12-02T10:41:06Z

What does this PR do?

InstanceDiffusion: Instance-level Control for Image Generation (CVPR 2024)

project page: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/
paper: https://arxiv.org/abs/2402.03290
official Code: https://github.com/frank-xwang/InstanceDiffusion
HF model card: https://huggingface.co/kyeongry/instancediffusion_sd15

note: The process of porting InstanceDiffusion to the diffusers library was largely based on the PRs about GLIGEN (link1, link2).

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yiyixuxu @asomoza

gokyeongryeol · 2024-12-05T07:06:11Z

Hi! @yiyixuxu @asomoza @sayakpaul
Would you review this PR?

sayakpaul · 2024-12-05T09:42:56Z

Could you also accompany the PR with a short description and perhaps some results?

gokyeongryeol · 2024-12-05T12:48:27Z

@sayakpaul

GLIGEN uses a fuser called gated-self-attention between unet's self-attention and cross-attention to reflect extra conditions other than text prompts to the generated image. And the network that obtains the embedding that becomes the input of the fuser is different for each type of extra condition given.

InstanceDiffusion belongs to the layout2image model and uses object-specific positions and phrases as extra conditions. However, unlike GLIGEN, it proposed a network called Unifusion that can encompass all types of object-specific position information, no matter they are box, point, scribble, or mask.

In addition, it proposed a ScaleU block that scales the embedding in the up-block of unet based on Fourier transform so better handle small objects.

Furthermore, to prevent overlapping objects from being blurred during inference, it proposed multi-instance sampling, whereby denoising is performed independently for each object at the beginning, aggregation is performed at a certain point, and the remaining denoising steps are performed afterwards.

I validated the implementation following the config and example prompt from the official repository. The example usage with diffusers and the result can be checked in huggingface model card.

github-actions · 2025-01-13T15:03:27Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

j-min · 2025-02-23T19:50:59Z

any updates on this?

github-actions · 2025-03-20T15:05:07Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

gokyeongryeol added 7 commits December 2, 2024 09:48

add InstanceDiffusion implementation

e67fb6e

Merge branch 'main' into instancediffusion

06be34b

fix dtype error

fc04e77

fix device assignment to quiet warning

688380f

comply to style guide

f2cea72

Merge branch 'main' into instancediffusion

93e82bc

modify pipeline to address empty layout

1ed1be3

add img2img pipeline

d377f57

frank-xwang mentioned this pull request Jan 6, 2025

Missing StableDiffusionINSTDIFFPipeline in the repository frank-xwang/InstanceDiffusion#43

Closed

github-actions bot added the stale Issues that haven't received updates label Jan 13, 2025

github-actions bot removed the stale Issues that haven't received updates label Feb 24, 2025

github-actions bot added the stale Issues that haven't received updates label Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add InstanceDiffusion implementation #10079

Add InstanceDiffusion implementation #10079

Uh oh!

gokyeongryeol commented Dec 2, 2024 •

edited

Loading

Uh oh!

gokyeongryeol commented Dec 5, 2024

Uh oh!

sayakpaul commented Dec 5, 2024

Uh oh!

gokyeongryeol commented Dec 5, 2024

Uh oh!

github-actions bot commented Jan 13, 2025

Uh oh!

j-min commented Feb 23, 2025

Uh oh!

github-actions bot commented Mar 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add InstanceDiffusion implementation #10079

Are you sure you want to change the base?

Add InstanceDiffusion implementation #10079

Uh oh!

Conversation

gokyeongryeol commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

gokyeongryeol commented Dec 5, 2024

Uh oh!

sayakpaul commented Dec 5, 2024

Uh oh!

gokyeongryeol commented Dec 5, 2024

Uh oh!

github-actions bot commented Jan 13, 2025

Uh oh!

j-min commented Feb 23, 2025

Uh oh!

github-actions bot commented Mar 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gokyeongryeol commented Dec 2, 2024 •

edited

Loading