-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Cosmos #10660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cosmos #10660
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
To match our sigmas to original exactly, without any rounding errors, I had to use
Also, we only match the sigmas if we set our |
Implementation looks good to me, Thanks @a-r-r-o-w . |
@amolfasale Thanks! We can merge the PR soon. I'm waiting for YiYi on how we're going to work on improving guardrail support (because it's very difficult to run them alongside the main transformer on consumer GPUs) |
@a-r-r-o-w |
Thanks @yiyixuxu! I'll take a look and update our implementation accordingly tomorrow |
@yiyixuxu I've updated the code to use the package. Could you take a look again? If everything looks good, let's try to get the 7B model weights merged and I'll open the 14B model weight PRs soon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! I left some questions but the PR looks good to me
@@ -568,5 +568,10 @@ def add_noise( | |||
noisy_samples = original_samples + noise * sigma | |||
return noisy_samples | |||
|
|||
# Copied from diffusers.schedulers.scheduling_edm_euler.EDMEulerScheduler._get_conditioning_c_in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just to make code more organized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if sigma_schedule == "karras": | ||
sigmas = self._compute_karras_sigmas(sigmas) | ||
elif sigma_schedule == "exponential": | ||
sigmas = self._compute_exponential_sigmas(sigmas) | ||
sigmas = sigmas.to(torch.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this float64 before? just curious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was float32 before. But, to match the sigmas value from the original implementation (which is correct and ours was a little off*), you need the arange and division by num_train_timesteps
to happen in float64. This was done to match the final outputs perfectly (if not, the diff is on the order of 1e-4 to 1e-6, so this change is not particularly required but I think we should keep it)
* If done in float32 (ours), the sigmas start from 79.998 IIRC due to precision issues. If done in float64 and then converted to float32 (current), the sigmas start as expected at 80.0
@pjannaty @amolfasale @asfiyab-nvidia Hey, could you take a look at the following PRs and let me know if the changes look alright? If all looks good, I can open PRs to the other Cosmos repos with the similar README and weight updates
Additionally, in a follow up PR, we will add support for loading the original format weights directly too |
Changes look good to me. What a major lift! Let's merge! |
@pjannaty @asfiyab-nvidia @amolfasale Here's the list of all the weight PRs:
We should be good to merge this code PR already, but users will not be able to download or be able to use the example code snippets until the weight PRs are merged (unless they add |
Thank you for the major lift, team! Let's merge! |
@pjannaty The weight PRs cannot be merged by us since we don't have access to the nvidia org. If you or someone with access to the repositories could take a look and merge those, it'd be great |
Thank you for merging the PRs @pjannaty! |
The cosmos is within us. We are made of star-stuff. We are a way for the universe to know itself.
Models
Transformer
test attention
test ff
test timesteps
test patch embed
test positional embed
test transformer block
test transformer
test transformer video
VAE
test vae attention
test vae
Text-to-World:
Video-to-World (image-conditioning):
Video-to-World (video-conditioning):
Note that the model repos are not yet compatible with Diffusers-loading. I'll open PRs for weights once nvidia team gives the thumbs up.
Inference code (old)