-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[core] refactor attention_processor.py
the easy way
#10022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for starting the discussion! Just throwing out an idea and what my personal preference is: having the attention processor implementation in the same file as the transformer implementation, instead of another directory for attention processors. This makes sense to couple into a single file because an attention processor used by one model is not necessarily usable by another model (maybe except for the common ones like AttentionProcessor2_0). Single file is also arguably more easy to start making modifications to without trying to figure out the control flow of things for research purposes (which is what we see most folks do when releasing new diffusers based models). As I too discussed this with Dhruv some time back, I tried to stick with the attention processor in the same modeling file as the transformer in #10021. If this is not how we want to do it, LMK so maybe this could be the first integration done in the way we expect after refactoring. |
I see that to be catering towards a particular set of users but not to a general perspective of a library. While having everything in a single-file does make sense to make things self-contained having a separate attention processor class helps to manage the situation where there's a need to have another attention processor for the same model (like mentioned above). Additionally, I don't think the redirection to understand control flow isn't too bad, either. It's essentially the same attention processor class, you just the class and get redirected to a different (perhaps meaningful and logical) location. So, not necessarily too much of baggage to understand a control flow in the context of attention. |
Okay so after discussing more about this with Dhruv, I can see why having the attention processor in the modeling file isn't too bad an idea. Multiple attention processors per model is an exception and isn't applicable to all models. He also brought up the idea of breaking Looking forward to the PRs. |
Hi, Many training free approaches directly need access to the attention manipulation in the pipeline. Thanks! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Worked on in #11368 by @a-r-r-o-w! It's massive stuff! |
With @DN6 we have been discussing an idea about breaking up
src/diffusers/models/attention_processor.py
as it's getting excruciatingly longer and longer. The idea is simple and won't very likely require multiple rounds of PRs.attention_processor
.FluxAttnProcessor2_0
,FluxAttnProcessor2_0_NPU
,FusedFluxAttnProcessor2_0
could go toattention_processor/flux_attention_processors.py
.AttnProcessor
,AttnProcessor2_0
, etc.) which are shared across different models could live in a common file i.e.,attention_processor/common.py
.Since
attention_prcoessor/
will have an init, I don't think there will be any breaking changes.The text was updated successfully, but these errors were encountered: