Commit 36a4b88
Mosaic memory profiling tutorial (#3744)
## Description
This PR adds a comprehensive tutorial on using
[Mosaic](https://github.com/facebookresearch/mosaic) for GPU memory
profiling in PyTorch.
Mosaic is a post-analysis tool for memory usage that was instrumental in
debugging OOM issues during the 405B LLaMA training.
## What users will learn
1. **Categorical Memory Profiling** - Breaking down memory by category
(activation, gradient, optimizer, parameters)
2. **Debugging Unexpected Memory** - Using stack trace analysis to find
abandoned debug code causing memory bloat
3. **Pipeline Integration** - Using Mosaic's Python API for automated
memory monitoring and CI/CD regression testing
## Tutorial structure
- Introduction to Mosaic and installation
- Simple usage examples (CLI commands)
- Real-World Case 1: Activation Checkpointing Analysis
- Real-World Case 2: Debugging Unexpected Memory Usage
- Real-World Case 3: Pipeline Integration with Python API
## Requirements
- PyTorch with CUDA support
- `pip install git+https://github.com/facebookresearch/mosaic.git`
- GPU required to run the examples
## Related Links
- Mosaic Repository: https://github.com/facebookresearch/mosaic
---------
Co-authored-by: Svetlana Karslioglu <[email protected]>1 parent ce954c6 commit 36a4b88
7 files changed
Lines changed: 1196 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| 52 | + | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
| |||
0 commit comments