Tags: hammad45/torchrec
Tags
refactoring logging enablement logic (meta-pytorch#3655) Summary: Pull Request resolved: meta-pytorch#3655 # context * Excited to try out the torchrec logger * consolidate logger and handler files, merge the util functions. * add some docstring in the logger.py file for better understanding Reviewed By: nipung90 Differential Revision: D90429199 fbshipit-source-id: 71ce47cf27eba6987bd4ca1e8da8b38c04e48849
remove unnecessary @seed_and_log in unittest.setUp (meta-pytorch#3672) Summary: Pull Request resolved: meta-pytorch#3672 # context * remove redundent `seen_and_log` decorator in unittest.setUp method * it's for init the seed before each test case ``` def seed_and_log(wrapped_func: Callable) -> Callable: # pyre-ignore [2, 3] def _wrapper(*args, **kwargs): seed = int(time.time() * 1000) % (1 << 31) print(f"Using random seed: {seed}") torch.manual_seed(seed) random.seed(seed) np.random.seed(seed) return wrapped_func(*args, **kwargs) return _wrapper ``` * A typical pattern is that this setUp method calls super().setUp, which is already decorated by `seen_and_log`: ``` class ModelParallelTestShared(MultiProcessTestBase): seed_and_log def setUp(self, backend: str = "nccl") -> None: super().setUp() ... class MultiProcessTestBase(unittest.TestCase): seed_and_log def setUp(self) -> None: ... ``` Reviewed By: spmex Differential Revision: D90950699 fbshipit-source-id: 1eb89a3bf0b6283659a14dd8f0dce7642072c36b
apply Black 25.11.0 style in fbcode (70/92) Summary: Formats the covered files with pyfmt. paintitblack Reviewed By: itamaro Differential Revision: D90476295 fbshipit-source-id: 5101d4aae980a9f8955a4cb10bae23997c48837f
refactor TrainPipelineBase to clean input batch after the forward pass ( meta-pytorch#3530) Summary: Pull Request resolved: meta-pytorch#3530 # context * previously in the TrainPipelineBase, the `cur_batch` (model input) is not released until calling the `loss.backward()`. * however, the `cur_batch` is only needed during the forward pass. * this diff changes the order of clearing the current batch so that it's cleared right after the forward pass NOTE: usually the peak memory usage happens at the beginning of the backward pass, so clearing the unused input batch can reduce the peak memory usage. * benchmark comparison indicates roughly 1~1.5x of memory saving (input batch ~ 1GB) |name|GPU Peak Memory alloc|GPU Peak Memory reserved| |--|--|--| |before|35.94 GB|56.72 GB| |after|34.33 GB|54.00 GB| |before-inplace|35.94 GB|53.91 GB| |after-inplace|34.33 GB|**51.35 GB**| NOTE: in-place copy batch to gpu won't change the gpu peak memory allocation, but can reduce the peak memory reservation. Reviewed By: aporialiao Differential Revision: D85483966 fbshipit-source-id: 4d49ba92530a65a4730806341c2eaec8b19a2e08
create github benchmark workflow (meta-pytorch#3631) Summary: Pull Request resolved: meta-pytorch#3631 # context * add a script to run train pipeline benchmark * add a github workflow which can run benchmark script nightly * the workflow can also be triggered manually * trace and memory snapshot will be uploaded to github artifacts * also fix some github workflow naming conventions and typos. NOTE: github runner `linux.g5.12xlarge.nvidia.gpu` only has 4 gpus with 20GB HBM, so can only support the *-light.yml benchmarks Reviewed By: spmex Differential Revision: D89629829 fbshipit-source-id: 8fcf5381117a2f52b44219c904f5889a78a8c05e
refactor debug_embedding_modules.py (meta-pytorch#3614) Summary: Pull Request resolved: meta-pytorch#3614 refactor the debug_embedding_modules.py, fix the return format of debug_EC. Reviewed By: spmex Differential Revision: D89102508 fbshipit-source-id: a25632b3c37b69aff0030e48e0bab50936e64ec2
Clean up KJT validator killswitch (meta-pytorch#3615) Summary: Pull Request resolved: meta-pytorch#3615 The killswitch `pytorch/torchrec:enable_kjt_validation` has been switched ON for a couple of months and it is working normally, so it should be safe to clean it up. Reviewed By: TroyGarden Differential Revision: D89088119 fbshipit-source-id: 9f339867f192d76224155bceb14619e98f35ff0e
Reland D87662877: Generate 1 acc graph for esr mb5 by removing fx wra… …pper for kjt (meta-pytorch#3582) Summary: Pull Request resolved: meta-pytorch#3582 D87662877 was reverted in D87747630 due to incompatibility between publish package and lowering package. E.g., In test it uses prod lowering package which does not include this diff change. Reviewed By: yingufan Differential Revision: D87848633 fbshipit-source-id: 1a45e5a79491873523e206829820d2b771a93013
Enable logging for the plan() function, ShardEstimators and TrainingP… …ipeline class constructors (meta-pytorch#3565) Summary: Pull Request resolved: meta-pytorch#3565 This diff enables the static logging functionality to collect data for: plan() - This will allow us to log the inputs and outputs to the planner to help with use issue debugging ShardEstimators - This will allow us to log the inputs and outputs to the ShardEstimators, which gives us the bandwidth inputs to verify if the planner is generating expected values as well as help with debugging OOMs TrainingPipeline - The class type here will be an indicator of which pipeline was used by the training job. The training pipeline has implications on the memory usage and is an important data point to collect to investigate OOMs. Reviewed By: kausv, nimaelyasi Differential Revision: D87488015 fbshipit-source-id: db8754e5e4c7b5e5a2b2e6bf7c6a4e0c6171cf71
PreviousNext