sd: fix VAE tiled fallback VRAM leak #10139
Merged
+16
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When the VAE catches this VRAM OOM, it launches the tiler fallback logic straight from the exception context.
Python however refs the entire call stack that caused the exception including any local variables for the sake of exception report and debugging. In the case of tensors, this can hold on the references to GBs of VRAM and inhibit the VRAM allocator from freeing them.
So dump the except context completely before going back to the VAE via the tiler by getting out of the except block with nothing but a flag.
The greatly increases the reliability of the tiler fallback, especially on low VRAM cards, as with the bug, if the leak randomly leaked more than the headroom needed for a single tile, the tiler fallback would OOM and fail the flow.
Test conditions:
768x768x13f WAN 2.1 VAE Encode using regular VAE encode (latent saved to file to terminate the flow)
NVIDIA GeForce GTX 1660 SUPER (6GB)
python main.py --novram --disable-cuda-malloc
(disable cuda malloc is needed for VRAM tracing)
Here is the VRAM usage over time before the fix:
The first big peak on the left is the attempt to do it untiled that OOMs. The repeat clusters of 4 little peaks thereafter are the individual tiles. Each peak is a latent frame (4 latent frame for 13f encode). Those giant horizontal bars under the little peaks are the leak.
With this change:
No more giant bars and the tiler has the full GPU VRAM to work with.
NOTE: Prints of the torch VRAM usage confirm the bug is independent of the --disable-cuda-malloc flag.
Test instrumentation diff: