[tests] add tests for framepack transformer model. #11520

sayakpaul · 2025-05-08T05:02:43Z

What does this PR do?

@a-r-r-o-w the following two model splitting tests are failing:

FAILED tests/models/transformers/test_models_transformer_hunyuan_video_framepack.py::HunyuanVideoTransformer3DTests::test_model_parallelism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argumen...
FAILED tests/models/transformers/test_models_transformer_hunyuan_video_framepack.py::HunyuanVideoTransformer3DTests::test_sharded_checkpoints_device_map - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument

Could you take a look when you have time? There are similar failures in HunuyanVideo transformer model, too, just as an FYI. Also, cc: @SunMarc

HuggingFaceDocBuilderDev · 2025-05-08T05:09:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w · 2025-05-08T12:49:32Z

It is expected that clean_x_embedder and x_embedder are put on the same device for this to pass. It is because accelerate performs the device allocation for different layers based on their initialization order. Moving the clean_x_embedder layer initialization right below x_embedder, and image_projection layer right below context_embedder, will fix the error for the tests.

diff --git a/src/diffusers/models/transformers/transformer_hunyuan_video_framepack.py b/src/diffusers/models/transformers/transformer_hunyuan_video_framepack.py
index 0331d9934..012a6e532 100644
--- a/src/diffusers/models/transformers/transformer_hunyuan_video_framepack.py
+++ b/src/diffusers/models/transformers/transformer_hunyuan_video_framepack.py
@@ -152,9 +152,14 @@ class HunyuanVideoFramepackTransformer3DModel(
 
         # 1. Latent and condition embedders
         self.x_embedder = HunyuanVideoPatchEmbed((patch_size_t, patch_size, patch_size), in_channels, inner_dim)
+        self.clean_x_embedder = None
+        if has_clean_x_embedder:
+            self.clean_x_embedder = HunyuanVideoHistoryPatchEmbed(in_channels, inner_dim)
         self.context_embedder = HunyuanVideoTokenRefiner(
             text_embed_dim, num_attention_heads, attention_head_dim, num_layers=num_refiner_layers
         )
+        # Framepack specific modules
+        self.image_projection = FramepackClipVisionProjection(image_proj_dim, inner_dim) if has_image_proj else None
         self.time_text_embed = HunyuanVideoConditionEmbedding(
             inner_dim, pooled_projection_dim, guidance_embeds, image_condition_type
         )
@@ -186,13 +191,6 @@ class HunyuanVideoFramepackTransformer3DModel(
         self.norm_out = AdaLayerNormContinuous(inner_dim, inner_dim, elementwise_affine=False, eps=1e-6)
         self.proj_out = nn.Linear(inner_dim, patch_size_t * patch_size * patch_size * out_channels)
 
-        # Framepack specific modules
-        self.image_projection = FramepackClipVisionProjection(image_proj_dim, inner_dim) if has_image_proj else None
-
-        self.clean_x_embedder = None
-        if has_clean_x_embedder:
-            self.clean_x_embedder = HunyuanVideoHistoryPatchEmbed(in_channels, inner_dim)
-
         self.gradient_checkpointing = False
 
     def forward(

But, this is not a "correct" fix in the general case. We need to put in device handling code in the concatenate statements for it to work as expected in the correct way. Something like:

hidden_states = torch.cat([latents_clean.to(hidden_states), hidden_states], dim=1)

It makes the code look unnecessarily complicated IMO since it is expected that these would already be on the correct device/dtype in the single GPU case. If we'd like to make these changes anyway, LMK and I'll open a PR.

sayakpaul · 2025-05-08T13:11:34Z

But, this is not a "correct" fix in the general case. We need to put in device handling code in the concatenate statements for it to work as expected in the correct way. Something like:

Exactly why I didn't make these changes because I strongly echo you opinions on it.

So, given that, I think it's still preferable to go with the other option you mentioned i.e., corresponding to the initialization order.

sayakpaul added 2 commits May 8, 2025 09:14

start.

2794029

add tests for framepack transformer model.

da07d86

Merge branch 'main' into framepack-transformer-tests

e5e7883

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tests] add tests for framepack transformer model. #11520

[tests] add tests for framepack transformer model. #11520

sayakpaul commented May 8, 2025

HuggingFaceDocBuilderDev commented May 8, 2025

a-r-r-o-w commented May 8, 2025

sayakpaul commented May 8, 2025

[tests] add tests for framepack transformer model. #11520

Are you sure you want to change the base?

[tests] add tests for framepack transformer model. #11520

Conversation

sayakpaul commented May 8, 2025

What does this PR do?

HuggingFaceDocBuilderDev commented May 8, 2025

a-r-r-o-w commented May 8, 2025

sayakpaul commented May 8, 2025