Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tags: hammad45/torchrec

Tags

v2026.01.26.00

Toggle v2026.01.26.00's commit message
refactoring logging enablement logic (meta-pytorch#3655)

Summary:
Pull Request resolved: meta-pytorch#3655

# context
* Excited to try out the torchrec logger
* consolidate logger and handler files, merge the util functions.
* add some docstring in the logger.py file for better understanding

Reviewed By: nipung90

Differential Revision: D90429199

fbshipit-source-id: 71ce47cf27eba6987bd4ca1e8da8b38c04e48849

v2026.01.19.00

Toggle v2026.01.19.00's commit message
remove unnecessary @seed_and_log in unittest.setUp (meta-pytorch#3672)

Summary:
Pull Request resolved: meta-pytorch#3672

# context
* remove redundent `seen_and_log` decorator in unittest.setUp method
* it's for init the seed before each test case
```
def seed_and_log(wrapped_func: Callable) -> Callable:
    # pyre-ignore [2, 3]
    def _wrapper(*args, **kwargs):
        seed = int(time.time() * 1000) % (1 << 31)
        print(f"Using random seed: {seed}")
        torch.manual_seed(seed)
        random.seed(seed)
        np.random.seed(seed)
        return wrapped_func(*args, **kwargs)

    return _wrapper
```

* A typical pattern is that this setUp method calls super().setUp, which is already decorated by `seen_and_log`:
```
class ModelParallelTestShared(MultiProcessTestBase):
    seed_and_log
    def setUp(self, backend: str = "nccl") -> None:
        super().setUp()
        ...

class MultiProcessTestBase(unittest.TestCase):
    seed_and_log
    def setUp(self) -> None:
        ...
```

Reviewed By: spmex

Differential Revision: D90950699

fbshipit-source-id: 1eb89a3bf0b6283659a14dd8f0dce7642072c36b

v2026.01.12.00

Toggle v2026.01.12.00's commit message
apply Black 25.11.0 style in fbcode (70/92)

Summary:
Formats the covered files with pyfmt.

paintitblack

Reviewed By: itamaro

Differential Revision: D90476295

fbshipit-source-id: 5101d4aae980a9f8955a4cb10bae23997c48837f

v2026.01.05.00

Toggle v2026.01.05.00's commit message
refactor TrainPipelineBase to clean input batch after the forward pass (

meta-pytorch#3530)

Summary:
Pull Request resolved: meta-pytorch#3530

# context
* previously in the TrainPipelineBase, the `cur_batch` (model input) is not released until calling the `loss.backward()`.
* however, the `cur_batch` is only needed during the forward pass.
* this diff changes the order of clearing the current batch so that it's cleared right after the forward pass

NOTE: usually the peak memory usage happens at the beginning of the backward pass, so clearing the unused input batch can reduce the peak memory usage.

* benchmark comparison indicates roughly 1~1.5x of memory saving (input batch ~ 1GB)

|name|GPU Peak Memory alloc|GPU Peak Memory reserved|
|--|--|--|
|before|35.94 GB|56.72 GB|
|after|34.33 GB|54.00 GB|
|before-inplace|35.94 GB|53.91 GB|
|after-inplace|34.33 GB|**51.35 GB**|

NOTE: in-place copy batch to gpu won't change the gpu peak memory allocation, but can reduce the peak memory reservation.

Reviewed By: aporialiao

Differential Revision: D85483966

fbshipit-source-id: 4d49ba92530a65a4730806341c2eaec8b19a2e08

v2025.12.29.00

Toggle v2025.12.29.00's commit message
create github benchmark workflow (meta-pytorch#3631)

Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* trace and memory snapshot will be uploaded to github artifacts
* also fix some github workflow naming conventions and typos.

NOTE: github runner `linux.g5.12xlarge.nvidia.gpu` only has 4 gpus with 20GB HBM, so can only support the *-light.yml benchmarks

Reviewed By: spmex

Differential Revision: D89629829

fbshipit-source-id: 8fcf5381117a2f52b44219c904f5889a78a8c05e

v2025.12.22.00

Toggle v2025.12.22.00's commit message
refactor debug_embedding_modules.py (meta-pytorch#3614)

Summary:
Pull Request resolved: meta-pytorch#3614

refactor the debug_embedding_modules.py, fix the return format of debug_EC.

Reviewed By: spmex

Differential Revision: D89102508

fbshipit-source-id: a25632b3c37b69aff0030e48e0bab50936e64ec2

v2025.12.15.00

Toggle v2025.12.15.00's commit message
Clean up KJT validator killswitch (meta-pytorch#3615)

Summary:
Pull Request resolved: meta-pytorch#3615

The killswitch `pytorch/torchrec:enable_kjt_validation` has been switched ON for a couple of months and it is working normally, so it should be safe to clean it up.

Reviewed By: TroyGarden

Differential Revision: D89088119

fbshipit-source-id: 9f339867f192d76224155bceb14619e98f35ff0e

v1.4.0-rc2

Toggle v1.4.0-rc2's commit message

v2025.12.01.00

Toggle v2025.12.01.00's commit message
Reland D87662877: Generate 1 acc graph for esr mb5 by removing fx wra…

…pper for kjt (meta-pytorch#3582)

Summary:
Pull Request resolved: meta-pytorch#3582

D87662877 was reverted in D87747630 due to incompatibility between publish package and lowering package. E.g., In test it uses prod lowering package which does not include this diff change.

Reviewed By: yingufan

Differential Revision: D87848633

fbshipit-source-id: 1a45e5a79491873523e206829820d2b771a93013

v2025.11.24.00

Toggle v2025.11.24.00's commit message
Enable logging for the plan() function, ShardEstimators and TrainingP…

…ipeline class constructors (meta-pytorch#3565)

Summary:
Pull Request resolved: meta-pytorch#3565

This diff enables the static logging functionality to collect data for:
plan() - This will allow us to log the inputs and outputs to the planner to help with use issue debugging
ShardEstimators - This will allow us to log the inputs and outputs to the ShardEstimators, which gives us the bandwidth inputs to verify if the planner is generating expected values as well as help with debugging OOMs
TrainingPipeline - The class type here will be an indicator of which pipeline was used by the training job. The training pipeline has implications on the memory usage and is an important data point to collect to investigate OOMs.

Reviewed By: kausv, nimaelyasi

Differential Revision: D87488015

fbshipit-source-id: db8754e5e4c7b5e5a2b2e6bf7c6a4e0c6171cf71