Utilities related to PyTorch. The code is really messy and originally written for my personal usage, but open-source here since someone wants to use it.
Currently contains:
- Universal Memory Profiler: Like torch memory profiler, but can examine more low-level memory allocations, such as NCCL internal buffers. I personally used this to handle NCCL related memory issues.
- Python GIL Detector: Know which thread is holding Python GIL (code is in https://github.com/fzyzcjy/py_gil_spy)
- Merge multiple Torch Profiler traces from multiple ranks into one big trace (useful when checking cooperation between ranks).
- When PDL is enabled, Perfetto will not render some overlapped events, which is fixed by convert_to_perfetto_compatible.py.
- PDL detector: show whether kernels have enabled PDL or not.
- Extract kernel time breakdown statistics (mean, std, etc) from profiles.