Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

rattus128
Copy link
Contributor

@rattus128 rattus128 commented Oct 1, 2025

When the VAE catches this VRAM OOM, it launches the tiler fallback logic straight from the exception context.

Python however refs the entire call stack that caused the exception including any local variables for the sake of exception report and debugging. In the case of tensors, this can hold on the references to GBs of VRAM and inhibit the VRAM allocator from freeing them.

So dump the except context completely before going back to the VAE via the tiler by getting out of the except block with nothing but a flag.

The greatly increases the reliability of the tiler fallback, especially on low VRAM cards, as with the bug, if the leak randomly leaked more than the headroom needed for a single tile, the tiler fallback would OOM and fail the flow.

Test conditions:

768x768x13f WAN 2.1 VAE Encode using regular VAE encode (latent saved to file to terminate the flow)
NVIDIA GeForce GTX 1660 SUPER (6GB)
python main.py --novram --disable-cuda-malloc
(disable cuda malloc is needed for VRAM tracing)

Here is the VRAM usage over time before the fix:

full-leak

The first big peak on the left is the attempt to do it untiled that OOMs. The repeat clusters of 4 little peaks thereafter are the individual tiles. Each peak is a latent frame (4 latent frame for 13f encode). Those giant horizontal bars under the little peaks are the leak.

With this change:

fixed

No more giant bars and the tiler has the full GPU VRAM to work with.

NOTE: Prints of the torch VRAM usage confirm the bug is independent of the --disable-cuda-malloc flag.

Test instrumentation diff:

--- a/comfy/sd.py
+++ b/comfy/sd.py
@@ -702,6 +702,7 @@ class VAE:
         return output.movedim(1, -1)
 
     def encode(self, pixel_samples):
+        torch.cuda.memory._record_memory_history()
         self.throw_exception_if_invalid()
         pixel_samples = self.vae_encode_crop_pixels(pixel_samples)
         pixel_samples = pixel_samples.movedim(-1, 1)
@@ -743,6 +744,7 @@ class VAE:
             else:
                 samples = self.encode_tiled_(pixel_samples)
 
+        torch.cuda.memory._dump_snapshot("memory_trace.pickle")
         return samples
 
     def encode_tiled(self, pixel_samples, tile_x=None, tile_y=None, overlap=None, tile_t=None, overlap_t=None):

@Kosinkadink
Copy link
Collaborator

Nice! Will try to get this reviewed and merged Wednesday afternoon PST.

When the VAE catches this VRAM OOM, it launches the fallback logic
straight from the exception context.

Python however refs the entire call stack that caused the exception
including any local variables for the sake of exception report and
debugging. In the case of tensors, this can hold on the references
to GBs of VRAM and inhibit the VRAM allocated from freeing them.

So dump the except context completely before going back to the VAE
via the tiler by getting out of the except block with nothing but
a flag.

The greately increases the reliability of the tiler fallback,
especially on low VRAM cards, as with the bug, if the leak randomly
leaked more than the headroom needed for a single tile, the tiler
would fallback would OOM and fail the flow.
@rattus128 rattus128 requested a review from chaObserv October 1, 2025 10:51
@Kosinkadink Kosinkadink added the Good PR This PR looks good to go, it needs comfy's final review. label Oct 1, 2025
@comfyanonymous comfyanonymous merged commit 911331c into comfyanonymous:master Oct 1, 2025
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good PR This PR looks good to go, it needs comfy's final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants