Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Running on Google Colab? #34

@aduchon

Description

@aduchon

Has anyone successfuly run this on a google Colab? I'm try the test code from the README. The models are in place, but:

/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py:111: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
[2025-08-26 06:51:04,442] INFO: {'__name__': 'Config: VideoLDM Decoder', 'mean': [0.5, 0.5, 0.5], 'std': [0.5, 0.5, 0.5], 'max_words': 1000, 'num_workers': 8, 'prefetch_factor': 2, 'resolution': [512, 768], 'vit_out_dim': 1024, 'vit_resolution': [224, 224], 'depth_clamp': 10.0, 'misc_size': 384, 'depth_std': 20.0, 'save_fps': 8, 'frame_lens': [32, 32, 32, 1], 'sample_fps': [4], 'vid_dataset': {'type': 'VideoBaseDataset', 'data_list': [], 'max_words': 1000, 'resolution': [448, 256]}, 'img_dataset': {'type': 'ImageBaseDataset', 'data_list': ['laion_400m'], 'max_words': 1000, 'resolution': [448, 256]}, 'batch_sizes': {'1': 256, '4': 4, '8': 4, '16': 4}, 'Diffusion': {'type': 'DiffusionDDIM', 'schedule': 'linear_sd', 'schedule_param': {'num_timesteps': 1000, 'init_beta': 0.00085, 'last_beta': 0.012, 'zero_terminal_snr': True}, 'mean_type': 'v', 'loss_type': 'mse', 'var_type': 'fixed_small', 'rescale_timesteps': False, 'noise_strength': 0.1, 'ddim_timesteps': 50}, 'ddim_timesteps': 30, 'use_div_loss': False, 'p_zero': 0.9, 'guide_scale': 2.5, 'vit_mean': [0.48145466, 0.4578275, 0.40821073], 'vit_std': [0.26862954, 0.26130258, 0.27577711], 'sketch_mean': [0.485, 0.456, 0.406], 'sketch_std': [0.229, 0.224, 0.225], 'hist_sigma': 10.0, 'scale_factor': 0.18215, 'use_checkpoint': True, 'use_sharded_ddp': False, 'use_fsdp': False, 'use_fp16': True, 'temporal_attention': True, 'UNet': {'type': 'UNetSD_Animate_X', 'in_dim': 4, 'dim': 320, 'y_dim': 1024, 'context_dim': 1024, 'out_dim': 4, 'dim_mult': [1, 2, 4, 4], 'num_heads': 8, 'head_dim': 64, 'num_res_blocks': 2, 'attn_scales': [1.0, 0.5, 0.25], 'dropout': 0.1, 'temporal_attention': True, 'temporal_attn_times': 1, 'use_checkpoint': True, 'use_fps_condition': False, 'use_sim_mask': False, 'config': 'None', 'num': 0, 'no_hand': True, 'num_tokens': 4}, 'guidances': [], 'auto_encoder': {'type': 'AutoencoderKL', 'ddconfig': {'double_z': True, 'z_channels': 4, 'resolution': 256, 'in_channels': 3, 'out_ch': 3, 'ch': 128, 'ch_mult': [1, 2, 4, 4], 'num_res_blocks': 2, 'attn_resolutions': [], 'dropout': 0.0, 'video_kernel_size': [3, 1, 1]}, 'embed_dim': 4, 'pretrained': 'checkpoints/v2-1_512-ema-pruned.ckpt'}, 'embedder': {'type': 'FrozenOpenCLIPTextVisualEmbedder', 'layer': 'penultimate', 'pretrained': 'checkpoints/open_clip_pytorch_model.bin'}, 'ema_decay': 0.9999, 'num_steps': 600000, 'lr': 5e-05, 'weight_decay': 0.0, 'betas': (0.9, 0.999), 'eps': 1e-08, 'chunk_size': 2, 'decoder_bs': 2, 'alpha': 0.7, 'save_ckp_interval': 1000, 'warmup_steps': 10, 'decay_mode': 'cosine', 'use_ema': False, 'load_from': None, 'Pretrain': {'type': 'pretrain_specific_strategies', 'fix_weight': False, 'grad_scale': 0.2, 'resume_checkpoint': 'models/jiuniu_0267000.pth', 'sd_keys_path': 'models/stable_diffusion_image_key_temporal_attention_x1.json'}, 'viz_interval': 1000, 'resume_checkpoint': '', 'visual_train': {'type': 'VisualTrainTextImageToVideo'}, 'visual_inference': {'type': 'VisualGeneratedVideos'}, 'inference_list_path': '', 'log_interval': 100, 'log_dir': 'results/Animate_X_infer', 'seed': 13, 'negative_prompt': 'Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms', 'max_frames': 32, 'round': 5, 'test_list_path': [[2, 'data/images/1.jpg', 'data/saved_pose/dance_1', 'data/saved_frames/dance_1', 'data/saved_pkl/dance_1.pkl', 14], [2, 'data/images/4.png', 'data/saved_pose/dance_1', 'data/saved_frames/dance_1', 'data/saved_pkl/dance_1.pkl', 14], [2, 'data/images/3.jpeg', 'data/saved_pose/dance_2', 'data/saved_frames/dance_2', 'data/saved_pkl/dance_2.pkl', 13], [2, 'data/images/10.jpeg', 'data/saved_pose/dance_2', 'data/saved_frames/dance_2', 'data/saved_pkl/dance_2.pkl', 14], [2, 'data/images/zhu.png', 'data/saved_pose/dance_3', 'data/saved_frames/dance_3', 'data/saved_pkl/dance_3.pkl', 14], [2, 'data/images/5.jpeg', 'data/saved_pose/dance_3', 'data/saved_frames/dance_3', 'data/saved_pkl/dance_3.pkl', 14], [1, 'data/images/2.jpeg', 'data/saved_pose/dance_3_right', 'data/saved_frames/dance_3_right', 'data/saved_pkl/dance_3_right.pkl', 13], [4, 'data/images/7.jpeg', 'data/saved_pose/ubc', 'data/saved_frames/ubc', 'data/saved_pkl/ubc.pkl', 14]], 'test_model': 'checkpoints/animate-x_ckpt.pth', 'partial_keys': [['image', 'local_image', 'dwpose', 'pose_embeddings']], 'TASK_TYPE': 'inference_animate_x_entrance', 'batch_size': 1, 'latent_random_ref': True, 'scale': 8, 'use_fps_condition': False, 'video_compositions': ['image', 'local_image', 'dwpose', 'randomref', 'randomref_pose', 'pose_embedding'], 'use_DiffusionDPM': False, 'CPU_CLIP_VAE': True, 'cfg_file': 'configs/Animate_X_infer.yaml', 'init_method': 'tcp://localhost:9999', 'debug': False, 'opts': [], 'pmi_rank': 0, 'pmi_world_size': 1, 'gpus_per_machine': 1, 'world_size': 1, 'gpu': 0, 'rank': 0, 'log_file': 'results/Animate_X_infer/log_00.txt'}
[2025-08-26 06:51:04,443] INFO: Running Animate-X inference on gpu 0
[2025-08-26 06:51:04,506] INFO: Parsing model identifier. Schema: None, Identifier: ViT-H-14
[2025-08-26 06:51:04,506] INFO: Loaded built-in ViT-H-14 model config.
[2025-08-26 06:51:04,506] INFO: `pretrained` specifies file path: checkpoints/open_clip_pytorch_model.bin
[2025-08-26 06:51:04,507] INFO: Instantiating model architecture: CLIP
[2025-08-26 06:51:13,695] INFO: Loading full pretrained weights from: checkpoints/open_clip_pytorch_model.bin
[2025-08-26 06:52:10,684] INFO: Final image preprocessing configuration set: {'size': (224, 224), 'mode': 'RGB', 'mean': (0.48145466, 0.4578275, 0.40821073), 'std': (0.26862954, 0.26130258, 0.27577711), 'interpolation': 'bicubic', 'resize_mode': 'shortest', 'fill_color': 0}
[2025-08-26 06:52:10,685] INFO: Model ViT-H-14 creation process complete.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/utils/registry.py", line 64, in build_from_config
[rank0]:     return req_type_entry(**cfg)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/animatex/inference_animate_x_entrance.py", line 61, in inference_animate_x_entrance
[rank0]:     worker(0, cfg, cfg_update)
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/animatex/inference_animate_x_entrance.py", line 344, in worker
[rank0]:     _, _, zero_y = clip_encoder(text="")
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/animatex/model/clip_embedder.py", line 184, in forward
[rank0]:     xt, x = self.encode_with_transformer(tokens.to(self.device))
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/animatex/model/clip_embedder.py", line 191, in encode_with_transformer
[rank0]:     x = self.text_transformer_forward(x, attn_mask=self.model.attn_mask)
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/animatex/model/clip_embedder.py", line 208, in text_transformer_forward
[rank0]:     x = r(x, attn_mask=attn_mask)
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/open_clip/transformer.py", line 298, in forward
[rank0]:     x = q_x + self.ls_1(self.attention(q_x=self.ln_1(q_x), k_x=k_x, v_x=v_x, attn_mask=attn_mask))
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/open_clip/transformer.py", line 283, in attention
[rank0]:     return self.attn(
[rank0]:            ^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/activation.py", line 1380, in forward
[rank0]:     attn_output, attn_output_weights = F.multi_head_attention_forward(
[rank0]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/functional.py", line 6338, in multi_head_attention_forward
[rank0]:     raise RuntimeError(
[rank0]: RuntimeError: The shape of the 2D attn_mask is torch.Size([77, 77]), but should be (1, 1).

[rank0]: During handling of the above exception, another exception occurred:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/inference.py", line 16, in <module>
[rank0]:     INFER_ENGINE.build(dict(type=cfg_update.TASK_TYPE), cfg_update=cfg_update.cfg_dict)
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/utils/registry.py", line 104, in build
[rank0]:     return self.build_func(*args, **kwargs, registry=self)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/utils/registry_class.py", line 9, in build_func
[rank0]:     return build_from_config(cfg, registry, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/content/drive/MyDrive/AI/animate-x/utils/registry.py", line 66, in build_from_config
[rank0]:     raise Exception(f"Failed to invoke function {req_type_entry}, with {e}")
[rank0]: Exception: Failed to invoke function <function inference_animate_x_entrance at 0x7d562d5122a0>, with The shape of the 2D attn_mask is torch.Size([77, 77]), but should be (1, 1).
[rank0]:[W826 06:52:12.471570618 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

The only new installs needed for the colab are:
!pip install oss2 # oss2==2.18.4 !pip install onnxruntime # onnxruntime==1.18.0 !pip install xformers # xformers==0.0.20 !pip install rotary-embedding-torch # ==0.5.3 !pip install fairscale # fairscale==0.4.13 !pip install open-clip-torch # open-clip-torch==2.24.0 !pip install kornia # kornia==0.7.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions