Codestin Search App

v2026.01.26.00

refactoring logging enablement logic (meta-pytorch#3655)

Summary:
Pull Request resolved: meta-pytorch#3655

# context
* Excited to try out the torchrec logger
* consolidate logger and handler files, merge the util functions.
* add some docstring in the logger.py file for better understanding

Reviewed By: nipung90

Differential Revision: D90429199

fbshipit-source-id: 71ce47cf27eba6987bd4ca1e8da8b38c04e48849

Jan 26, 2026
764bed2
zip
tar.gz

v2026.01.19.00

remove unnecessary @seed_and_log in unittest.setUp (meta-pytorch#3672)

Summary:
Pull Request resolved: meta-pytorch#3672

# context
* remove redundent `seen_and_log` decorator in unittest.setUp method
* it's for init the seed before each test case
```
def seed_and_log(wrapped_func: Callable) -> Callable:
    # pyre-ignore [2, 3]
    def _wrapper(*args, **kwargs):
        seed = int(time.time() * 1000) % (1 << 31)
        print(f"Using random seed: {seed}")
        torch.manual_seed(seed)
        random.seed(seed)
        np.random.seed(seed)
        return wrapped_func(*args, **kwargs)

    return _wrapper
```

* A typical pattern is that this setUp method calls super().setUp, which is already decorated by `seen_and_log`:
```
class ModelParallelTestShared(MultiProcessTestBase):
    seed_and_log
    def setUp(self, backend: str = "nccl") -> None:
        super().setUp()
        ...

class MultiProcessTestBase(unittest.TestCase):
    seed_and_log
    def setUp(self) -> None:
        ...
```

Reviewed By: spmex

Differential Revision: D90950699

fbshipit-source-id: 1eb89a3bf0b6283659a14dd8f0dce7642072c36b

Jan 19, 2026
2f7828c
zip
tar.gz

v2026.01.12.00

apply Black 25.11.0 style in fbcode (70/92)

Summary:
Formats the covered files with pyfmt.

paintitblack

Reviewed By: itamaro

Differential Revision: D90476295

fbshipit-source-id: 5101d4aae980a9f8955a4cb10bae23997c48837f

Jan 12, 2026
0ec0d62
zip
tar.gz

v2026.01.05.00

refactor TrainPipelineBase to clean input batch after the forward pass (

meta-pytorch#3530)

Summary:
Pull Request resolved: meta-pytorch#3530

# context
* previously in the TrainPipelineBase, the `cur_batch` (model input) is not released until calling the `loss.backward()`.
* however, the `cur_batch` is only needed during the forward pass.
* this diff changes the order of clearing the current batch so that it's cleared right after the forward pass

NOTE: usually the peak memory usage happens at the beginning of the backward pass, so clearing the unused input batch can reduce the peak memory usage.

* benchmark comparison indicates roughly 1~1.5x of memory saving (input batch ~ 1GB)

|name|GPU Peak Memory alloc|GPU Peak Memory reserved|
|--|--|--|
|before|35.94 GB|56.72 GB|
|after|34.33 GB|54.00 GB|
|before-inplace|35.94 GB|53.91 GB|
|after-inplace|34.33 GB|**51.35 GB**|

NOTE: in-place copy batch to gpu won't change the gpu peak memory allocation, but can reduce the peak memory reservation.

Reviewed By: aporialiao

Differential Revision: D85483966

fbshipit-source-id: 4d49ba92530a65a4730806341c2eaec8b19a2e08

Jan 5, 2026
e244ce9
zip
tar.gz

v2025.12.29.00

create github benchmark workflow (meta-pytorch#3631)

Summary:
Pull Request resolved: meta-pytorch#3631

# context
* add a script to run train pipeline benchmark
* add a github workflow which can run benchmark script nightly
* the workflow can also be triggered manually
* trace and memory snapshot will be uploaded to github artifacts
* also fix some github workflow naming conventions and typos.

NOTE: github runner `linux.g5.12xlarge.nvidia.gpu` only has 4 gpus with 20GB HBM, so can only support the *-light.yml benchmarks

Reviewed By: spmex

Differential Revision: D89629829

fbshipit-source-id: 8fcf5381117a2f52b44219c904f5889a78a8c05e

Dec 28, 2025
217e97a
zip
tar.gz

v2025.12.22.00

refactor debug_embedding_modules.py (meta-pytorch#3614)

Summary:
Pull Request resolved: meta-pytorch#3614

refactor the debug_embedding_modules.py, fix the return format of debug_EC.

Reviewed By: spmex

Differential Revision: D89102508

fbshipit-source-id: a25632b3c37b69aff0030e48e0bab50936e64ec2

Dec 22, 2025
276b4ae
zip
tar.gz

v2025.12.15.00

Clean up KJT validator killswitch (meta-pytorch#3615)

Summary:
Pull Request resolved: meta-pytorch#3615

The killswitch `pytorch/torchrec:enable_kjt_validation` has been switched ON for a couple of months and it is working normally, so it should be safe to clean it up.

Reviewed By: TroyGarden

Differential Revision: D89088119

fbshipit-source-id: 9f339867f192d76224155bceb14619e98f35ff0e

Dec 13, 2025
312f4dd
zip
tar.gz

v1.4.0-rc2

Dec 7, 2025
da377ef
zip
tar.gz

v2025.12.01.00

Reland D87662877: Generate 1 acc graph for esr mb5 by removing fx wra…

…pper for kjt (meta-pytorch#3582)

Summary:
Pull Request resolved: meta-pytorch#3582

D87662877 was reverted in D87747630 due to incompatibility between publish package and lowering package. E.g., In test it uses prod lowering package which does not include this diff change.

Reviewed By: yingufan

Differential Revision: D87848633

fbshipit-source-id: 1a45e5a79491873523e206829820d2b771a93013

Nov 29, 2025
ca2f687
zip
tar.gz

v2025.11.24.00

Enable logging for the plan() function, ShardEstimators and TrainingP…

…ipeline class constructors (meta-pytorch#3565)

Summary:
Pull Request resolved: meta-pytorch#3565

This diff enables the static logging functionality to collect data for:
plan() - This will allow us to log the inputs and outputs to the planner to help with use issue debugging
ShardEstimators - This will allow us to log the inputs and outputs to the ShardEstimators, which gives us the bandwidth inputs to verify if the planner is generating expected values as well as help with debugging OOMs
TrainingPipeline - The class type here will be an indicator of which pipeline was used by the training job. The training pipeline has implications on the memory usage and is an important data point to collect to investigate OOMs.

Reviewed By: kausv, nimaelyasi

Differential Revision: D87488015

fbshipit-source-id: db8754e5e4c7b5e5a2b2e6bf7c6a4e0c6171cf71

Nov 24, 2025
791373f
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2026.01.26.00

v2026.01.19.00

v2026.01.12.00

v2026.01.05.00

v2025.12.29.00

v2025.12.22.00

v2025.12.15.00

v1.4.0-rc2

v2025.12.01.00

v2025.11.24.00

Tags: hammad45/torchrec