Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Suggestion] Removing (1 - t) clipping during sampling might perform better #24

@Akihisa-Watanabe

Description

@Akihisa-Watanabe

Hi, thank you for releasing the code and the inspiring paper!

I’ve been experimenting with applying JiT-style x-prediction with a velocity loss to text-to-motion diffusion models, and I ran into an interesting behavior around clipping (1 - t) during sampling.

I wrote up the details in this blog post:

JiT for Motion Diffusion Models

Very briefly, I found that it is not necessary to clip (1 - t) during sampling for the 50-step Heun2 sampler. Keeping the clipping only during training works fine in terms of stability, and in my experiments, it actually gives better sample quality (FID) for text-to-motion diffusion.

I have only verified this behavior on text-to-motion diffusion, but I suspect this modification might also be helpful for image diffusion models. I wanted to share this observation as it might lead to improvements in this great work. It would be great if you could verify this on your end if you are interested.

Thanks again for the great work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions