Running on Google Colab?

Has anyone successfuly run this on a google Colab?   I'm try the test code from the README.  The models are in place, but:

```For Windows users, we explicitly import registry function inference_animate_x_entrance !!!
/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py:111: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
[2025-08-26 06:51:04,442] INFO: {'__name__': 'Config: VideoLDM Decoder', 'mean': [0.5, 0.5, 0.5], 'std': [0.5, 0.5, 0.5], 'max_words': 1000, 'num_workers': 8, 'prefetch_factor': 2, 'resolution': [512, 768], 'vit_out_dim': 1024, 'vit_resolution': [224, 224], 'depth_clamp': 10.0, 'misc_size': 384, 'depth_std': 20.0, 'save_fps': 8, 'frame_lens': [32, 32, 32, 1], 'sample_fps': [4], 'vid_dataset': {'type': 'VideoBaseDataset', 'data_list': [], 'max_words': 1000, 'resolution': [448, 256]}, 'img_dataset': {'type': 'ImageBaseDataset', 'data_list': ['laion_400m'], 'max_words': 1000, 'resolution': [448, 256]}, 'batch_sizes': {'1': 256, '4': 4, '8': 4, '16': 4}, 'Diffusion': {'type': 'DiffusionDDIM', 'schedule': 'linear_sd', 'schedule_param': {'num_timesteps': 1000, 'init_beta': 0.00085, 'last_beta': 0.012, 'zero_terminal_snr': True}, 'mean_type': 'v', 'loss_type': 'mse', 'var_type': 'fixed_small', 'rescale_timesteps': False, 'noise_strength': 0.1, 'ddim_timesteps': 50}, 'ddim_timesteps': 30, 'use_div_loss': False, 'p_zero': 0.9, 'guide_scale': 2.5, 'vit_mean': [0.48145466, 0.4578275, 0.40821073], 'vit_std': [0.26862954, 0.26130258, 0.27577711], 'sketch_mean': [0.485, 0.456, 0.406], 'sketch_std': [0.229, 0.224, 0.225], 'hist_sigma': 10.0, 'scale_factor': 0.18215, 'use_checkpoint': True, 'use_sharded_ddp': False, 'use_fsdp': False, 'use_fp16': True, 'temporal_attention': True, 'UNet': {'type': 'UNetSD_Animate_X', 'in_dim': 4, 'dim': 320, 'y_dim': 1024, 'context_dim': 1024, 'out_dim': 4, 'dim_mult': [1, 2, 4, 4], 'num_heads': 8, 'head_dim': 64, 'num_res_blocks': 2, 'attn_scales': [1.0, 0.5, 0.25], 'dropout': 0.1, 'temporal_attention': True, 'temporal_attn_times': 1, 'use_checkpoint': True, 'use_fps_condition': False, 'use_sim_mask': False, 'config': 'None', 'num': 0, 'no_hand': True, 'num_tokens': 4}, 'guidances': [], 'auto_encoder': {'type': 'AutoencoderKL', 'ddconfig': {'double_z': True, 'z_channels': 4, 'resolution': 256, 'in_channels': 3, 'out_ch': 3, 'ch': 128, 'ch_mult': [1, 2, 4, 4], 'num_res_blocks': 2, 'attn_resolutions': [], 'dropout': 0.0, 'video_kernel_size': [3, 1, 1]}, 'embed_dim': 4, 'pretrained': 'checkpoints/v2-1_512-ema-pruned.ckpt'}, 'embedder': {'type': 'FrozenOpenCLIPTextVisualEmbedder', 'layer': 'penultimate', 'pretrained': 'checkpoints/open_clip_pytorch_model.bin'}, 'ema_decay': 0.9999, 'num_steps': 600000, 'lr': 5e-05, 'weight_decay': 0.0, 'betas': (0.9, 0.999), 'eps': 1e-08, 'chunk_size': 2, 'decoder_bs': 2, 'alpha': 0.7, 'save_ckp_interval': 1000, 'warmup_steps': 10, 'decay_mode': 'cosine', 'use_ema': False, 'load_from': None, 'Pretrain': {'type': 'pretrain_specific_strategies', 'fix_weight': False, 'grad_scale': 0.2, 'resume_checkpoint': 'models/jiuniu_0267000.pth', 'sd_keys_path': 'models/stable_diffusion_image_key_temporal_attention_x1.json'}, 'viz_interval': 1000, 'resume_checkpoint': '', 'visual_train': {'type': 'VisualTrainTextImageToVideo'}, 'visual_inference': {'type': 'VisualGeneratedVideos'}, 'inference_list_path': '', 'log_interval': 100, 'log_dir': 'results/Animate_X_infer', 'seed': 13, 'negative_prompt': 'Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms', 'max_frames': 32, 'round': 5, 'test_list_path': [[2, 'data/images/1.jpg', 'data/saved_pose/dance_1', 'data/saved_frames/dance_1', 'data/saved_pkl/dance_1.pkl', 14], [2, 'data/images/4.png', 'data/saved_pose/dance_1', 'data/saved_frames/dance_1', 'data/saved_pkl/dance_1.pkl', 14], [2, 'data/images/3.jpeg', 'data/saved_pose/dance_2', 'data/saved_frames/dance_2', 'data/saved_pkl/dance_2.pkl', 13], [2, 'data/images/10.jpeg', 'data/saved_pose/dance_2', 'data/saved_frames/dance_2', 'data/saved_pkl/dance_2.pkl', 14], [2, 'data/images/zhu.png', 'data/saved_pose/dance_3', 'data/saved_frames/dance_3', 'data/saved_pkl/dance_3.pkl', 14], [2, 'data/images/5.jpeg', 'data/saved_pose/dance_3', 'data/saved_frames/dance_3', 'data/saved_pkl/dance_3.pkl', 14], [1, 'data/images/2.jpeg', 'data/saved_pose/dance_3_right', 'data/saved_frames/dance_3_right', 'data/saved_pkl/dance_3_right.pkl', 13], [4, 'data/images/7.jpeg', 'data/saved_pose/ubc', 'data/saved_frames/ubc', 'data/saved_pkl/ubc.pkl', 14]], 'test_model': 'checkpoints/animate-x_ckpt.pth', 'partial_keys': [['image', 'local_image', 'dwpose', 'pose_embeddings']], 'TASK_TYPE': 'inference_animate_x_entrance', 'batch_size': 1, 'latent_random_ref': True, 'scale': 8, 'use_fps_condition': False, 'video_compositions': ['image', 'local_image', 'dwpose', 'randomref', 'randomref_pose', 'pose_embedding'], 'use_DiffusionDPM': False, 'CPU_CLIP_VAE': True, 'cfg_file': 'configs/Animate_X_infer.yaml', 'init_method': 'tcp://localhost:9999', 'debug': False, 'opts': [], 'pmi_rank': 0, 'pmi_world_size': 1, 'gpus_per_machine': 1, 'world_size': 1, 'gpu': 0, 'rank': 0, 'log_file': 'results/Animate_X_infer/log_00.txt'}
[2025-08-26 06:51:04,443] INFO: Running Animate-X inference on gpu 0
[2025-08-26 06:51:04,506] INFO: Parsing model identifier. Schema: None, Identifier: ViT-H-14
[2025-08-26 06:51:04,506] INFO: Loaded built-in ViT-H-14 model config.
[2025-08-26 06:51:04,506] INFO: `pretrained` specifies file path: checkpoints/open_clip_pytorch_model.bin
[2025-08-26 06:51:04,507] INFO: Instantiating model architecture: CLIP
[2025-08-26 06:51:13,695] INFO: Loading full pretrained weights from: checkpoints/open_clip_pytorch_model.bin
[2025-08-26 06:52:10,684] INFO: Final image preprocessing configuration set: {'size': (224, 224), 'mode': 'RGB', 'mean': (0.48145466, 0.4578275, 0.40821073), 'std': (0.26862954, 0.26130258, 0.27577711), 'interpolation': 'bicubic', 'resize_mode': 'shortest', 'fill_color': 0}
[2025-08-26 06:52:10,685] INFO: Model ViT-H-14 creation process complete.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/utils/registry.py", line 64, in build_from_config
[rank0]:     return req_type_entry(**cfg)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/animatex/inference_animate_x_entrance.py", line 61, in inference_animate_x_entrance
[rank0]:     worker(0, cfg, cfg_update)
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/animatex/inference_animate_x_entrance.py", line 344, in worker
[rank0]:     _, _, zero_y = clip_encoder(text="")
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/animatex/model/clip_embedder.py", line 184, in forward
[rank0]:     xt, x = self.encode_with_transformer(tokens.to(self.device))
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/animatex/model/clip_embedder.py", line 191, in encode_with_transformer
[rank0]:     x = self.text_transformer_forward(x, attn_mask=self.model.attn_mask)
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/animatex/model/clip_embedder.py", line 208, in text_transformer_forward
[rank0]:     x = r(x, attn_mask=attn_mask)
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/open_clip/transformer.py", line 298, in forward
[rank0]:     x = q_x + self.ls_1(self.attention(q_x=self.ln_1(q_x), k_x=k_x, v_x=v_x, attn_mask=attn_mask))
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/open_clip/transformer.py", line 283, in attention
[rank0]:     return self.attn(
[rank0]:            ^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/activation.py", line 1380, in forward
[rank0]:     attn_output, attn_output_weights = F.multi_head_attention_forward(
[rank0]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/functional.py", line 6338, in multi_head_attention_forward
[rank0]:     raise RuntimeError(
[rank0]: RuntimeError: The shape of the 2D attn_mask is torch.Size([77, 77]), but should be (1, 1).

[rank0]: During handling of the above exception, another exception occurred:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/inference.py", line 16, in <module>
[rank0]:     INFER_ENGINE.build(dict(type=cfg_update.TASK_TYPE), cfg_update=cfg_update.cfg_dict)
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/utils/registry.py", line 104, in build
[rank0]:     return self.build_func(*args, **kwargs, registry=self)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/utils/registry_class.py", line 9, in build_func
[rank0]:     return build_from_config(cfg, registry, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/utils/registry.py", line 66, in build_from_config
[rank0]:     raise Exception(f"Failed to invoke function {req_type_entry}, with {e}")
[rank0]: Exception: Failed to invoke function <function inference_animate_x_entrance at 0x7d562d5122a0>, with The shape of the 2D attn_mask is torch.Size([77, 77]), but should be (1, 1).
[rank0]:[W826 06:52:12.471570618 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
```

The only new installs needed for the colab are:
`
!pip install oss2 # oss2==2.18.4
!pip install onnxruntime # onnxruntime==1.18.0
!pip install xformers # xformers==0.0.20
!pip install rotary-embedding-torch # ==0.5.3
!pip install fairscale # fairscale==0.4.13
!pip install open-clip-torch # open-clip-torch==2.24.0
!pip install kornia # kornia==0.7.1
`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running on Google Colab? #34

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running on Google Colab? #34

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions