Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[core] refactor attention_processor.py the easy way #10022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sayakpaul opened this issue Nov 26, 2024 · 6 comments
Open

[core] refactor attention_processor.py the easy way #10022

sayakpaul opened this issue Nov 26, 2024 · 6 comments
Assignees

Comments

@sayakpaul
Copy link
Member

With @DN6 we have been discussing an idea about breaking up src/diffusers/models/attention_processor.py as it's getting excruciatingly longer and longer. The idea is simple and won't very likely require multiple rounds of PRs.

  • Create a module named attention_processor.
  • Split the attention processor classes with respect to the model names they are used. This makes sense because pretty much most of the models (at least the widely used ones) have their own attention processors at this point. For example, FluxAttnProcessor2_0, FluxAttnProcessor2_0_NPU, FusedFluxAttnProcessor2_0 could go to attention_processor/flux_attention_processors.py.
  • Other attention processors (such as AttnProcessor, AttnProcessor2_0, etc.) which are shared across different models could live in a common file i.e., attention_processor/common.py.

Since attention_prcoessor/ will have an init, I don't think there will be any breaking changes.

@a-r-r-o-w
Copy link
Member

Thanks for starting the discussion! Just throwing out an idea and what my personal preference is: having the attention processor implementation in the same file as the transformer implementation, instead of another directory for attention processors. This makes sense to couple into a single file because an attention processor used by one model is not necessarily usable by another model (maybe except for the common ones like AttentionProcessor2_0). Single file is also arguably more easy to start making modifications to without trying to figure out the control flow of things for research purposes (which is what we see most folks do when releasing new diffusers based models).

As I too discussed this with Dhruv some time back, I tried to stick with the attention processor in the same modeling file as the transformer in #10021. If this is not how we want to do it, LMK so maybe this could be the first integration done in the way we expect after refactoring.

@sayakpaul
Copy link
Member Author

This makes sense to couple into a single file because an attention processor used by one model is not necessarily usable by another model (maybe except for the common ones like AttentionProcessor2_0). Single file is also arguably more easy to start making modifications to without trying to figure out the control flow of things for research purposes (which is what we see most folks do when releasing new diffusers based models).

I see that to be catering towards a particular set of users but not to a general perspective of a library. While having everything in a single-file does make sense to make things self-contained having a separate attention processor class helps to manage the situation where there's a need to have another attention processor for the same model (like mentioned above).

Additionally, I don't think the redirection to understand control flow isn't too bad, either. It's essentially the same attention processor class, you just the class and get redirected to a different (perhaps meaningful and logical) location. So, not necessarily too much of baggage to understand a control flow in the context of attention.

@yiyixuxu yiyixuxu assigned DN6 and unassigned DN6 and sayakpaul Nov 26, 2024
@sayakpaul
Copy link
Member Author

Okay so after discussing more about this with Dhruv, I can see why having the attention processor in the modeling file isn't too bad an idea. Multiple attention processors per model is an exception and isn't applicable to all models. He also brought up the idea of breaking Attention into multiple model-specific attention classes and moving them to their respective modeling files.

Looking forward to the PRs.

@darshats-typeface
Copy link

Hi,
I recently had to write a custom attention processor. Since this processor works on self attention, it conflicts with using PAG pipelines. I still want to use the goodness of PAG so I ended up doing a composite where the self attention is updated first, then passed on to inner contained PAG processor.
The issue I ran into is the pipeline and PAG processor are tied in some way e.g. set_pag_applied_layers is needed to correctly register the processor followed by saving off the original unet attn_processors. When you refactor, could you take a look at this if there is ways to simplify this one - if there is a more formal way to tie and untie attention processors to pipelines.

Many training free approaches directly need access to the attention manipulation in the pipeline. Thanks!

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Dec 26, 2024
@a-r-r-o-w a-r-r-o-w added wip and removed stale Issues that haven't received updates labels Jan 12, 2025
@sayakpaul
Copy link
Member Author

Worked on in #11368 by @a-r-r-o-w! It's massive stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants