[core] refactor `attention_processor.py` the easy way #10022

sayakpaul · 2024-11-26T03:37:33Z

With @DN6 we have been discussing an idea about breaking up src/diffusers/models/attention_processor.py as it's getting excruciatingly longer and longer. The idea is simple and won't very likely require multiple rounds of PRs.

Create a module named attention_processor.
Split the attention processor classes with respect to the model names they are used. This makes sense because pretty much most of the models (at least the widely used ones) have their own attention processors at this point. For example, FluxAttnProcessor2_0, FluxAttnProcessor2_0_NPU, FusedFluxAttnProcessor2_0 could go to attention_processor/flux_attention_processors.py.
Other attention processors (such as AttnProcessor, AttnProcessor2_0, etc.) which are shared across different models could live in a common file i.e., attention_processor/common.py.

Since attention_prcoessor/ will have an init, I don't think there will be any breaking changes.

The text was updated successfully, but these errors were encountered:

a-r-r-o-w · 2024-11-26T03:49:03Z

Thanks for starting the discussion! Just throwing out an idea and what my personal preference is: having the attention processor implementation in the same file as the transformer implementation, instead of another directory for attention processors. This makes sense to couple into a single file because an attention processor used by one model is not necessarily usable by another model (maybe except for the common ones like AttentionProcessor2_0). Single file is also arguably more easy to start making modifications to without trying to figure out the control flow of things for research purposes (which is what we see most folks do when releasing new diffusers based models).

As I too discussed this with Dhruv some time back, I tried to stick with the attention processor in the same modeling file as the transformer in #10021. If this is not how we want to do it, LMK so maybe this could be the first integration done in the way we expect after refactoring.

sayakpaul · 2024-11-26T04:00:07Z

This makes sense to couple into a single file because an attention processor used by one model is not necessarily usable by another model (maybe except for the common ones like AttentionProcessor2_0). Single file is also arguably more easy to start making modifications to without trying to figure out the control flow of things for research purposes (which is what we see most folks do when releasing new diffusers based models).

I see that to be catering towards a particular set of users but not to a general perspective of a library. While having everything in a single-file does make sense to make things self-contained having a separate attention processor class helps to manage the situation where there's a need to have another attention processor for the same model (like mentioned above).

Additionally, I don't think the redirection to understand control flow isn't too bad, either. It's essentially the same attention processor class, you just the class and get redirected to a different (perhaps meaningful and logical) location. So, not necessarily too much of baggage to understand a control flow in the context of attention.

sayakpaul · 2024-11-26T06:12:54Z

Okay so after discussing more about this with Dhruv, I can see why having the attention processor in the modeling file isn't too bad an idea. Multiple attention processors per model is an exception and isn't applicable to all models. He also brought up the idea of breaking Attention into multiple model-specific attention classes and moving them to their respective modeling files.

Looking forward to the PRs.

darshats-typeface · 2024-11-28T02:35:56Z

Hi,
I recently had to write a custom attention processor. Since this processor works on self attention, it conflicts with using PAG pipelines. I still want to use the goodness of PAG so I ended up doing a composite where the self attention is updated first, then passed on to inner contained PAG processor.
The issue I ran into is the pipeline and PAG processor are tied in some way e.g. set_pag_applied_layers is needed to correctly register the processor followed by saving off the original unet attn_processors. When you refactor, could you take a look at this if there is ways to simplify this one - if there is a more formal way to tie and untie attention processors to pipelines.

Many training free approaches directly need access to the attention manipulation in the pipeline. Thanks!

github-actions · 2024-12-26T15:02:59Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul · 2025-05-08T03:31:26Z

Worked on in #11368 by @a-r-r-o-w! It's massive stuff!

sayakpaul assigned DN6 and sayakpaul Nov 26, 2024

yiyixuxu assigned DN6 and unassigned DN6 and sayakpaul Nov 26, 2024

a-r-r-o-w added the refactor label Nov 27, 2024

github-actions bot added the stale Issues that haven't received updates label Dec 26, 2024

a-r-r-o-w added wip and removed stale Issues that haven't received updates labels Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] refactor `attention_processor.py` the easy way #10022

[core] refactor `attention_processor.py` the easy way #10022

sayakpaul commented Nov 26, 2024

a-r-r-o-w commented Nov 26, 2024

sayakpaul commented Nov 26, 2024

sayakpaul commented Nov 26, 2024

darshats-typeface commented Nov 28, 2024

github-actions bot commented Dec 26, 2024

sayakpaul commented May 8, 2025

[core] refactor attention_processor.py the easy way #10022

[core] refactor attention_processor.py the easy way #10022

Comments

sayakpaul commented Nov 26, 2024

a-r-r-o-w commented Nov 26, 2024

sayakpaul commented Nov 26, 2024

sayakpaul commented Nov 26, 2024

darshats-typeface commented Nov 28, 2024

github-actions bot commented Dec 26, 2024

sayakpaul commented May 8, 2025

[core] refactor `attention_processor.py` the easy way #10022

[core] refactor `attention_processor.py` the easy way #10022