Codestin Search App

v2025.05.26.00

Fix empty sharding constraints in `test_model_parallel.py` (meta-pyto…

…rch#2998)

Summary:
Pull Request resolved: meta-pytorch#2998

#### Context

Several unit tests in `test_model_parallel.py` passed **empty constraints** into `self._test_sharding` because the constraints are generated using an empty `self.tables` before invoking `self._build_tables_and_groups`.

Impacted tests are:
* `test_sharding_twcw`
* `test_sharding_variable_batch`
* `test_sharding_multiple_kernels`

#### Changes

* Constraints only depend on table names. A new list `self.table_names` is created in `setUp()` stage to be used to construct constraints.
* Updates `self._build_tables_and_groups` to use the generated table names.
* Increases `max_examples` for `test_sharding_multiple_kernels` to cover both FP32 and FP16 cases.

Reviewed By: TroyGarden

Differential Revision: D75306149

fbshipit-source-id: b93f7656e45a8c79393a1c347437f757aac07557

May 24, 2025
b0919ce
zip
tar.gz

v2025.05.19.00

split train_pipeline.utils - pipeline_context (meta-pytorch#2978)

Summary:
Pull Request resolved: meta-pytorch#2978

# context
* train_pipeline.utils file is overloaded
* split the functions, classes, etc. into three files with each ~< 1000 lines
* this diff:
pipeline_context.py

Reviewed By: malaybag

Differential Revision: D73906059

fbshipit-source-id: 7b3e59279a5b27b1953d0e24cc206c8a395bbd8e

May 19, 2025
2fa7ea7
zip
tar.gz

v2025.05.12.00

Add raw embedding streaming needed params in trec and mvai (meta-pyto…

…rch#2935)

Summary:
Pull Request resolved: meta-pytorch#2935

Add the variables needed in D73792631 to mvai and torch rec to be able to control them via config.

Reviewed By: aliafzal

Differential Revision: D74086201

fbshipit-source-id: 53fb269c17f08d87589a837d2049b733db0d665e

May 9, 2025
d6031f9
zip
tar.gz

v2025.05.05.00

support zero collision tables in ssd tbe (meta-pytorch#2919)

Summary:
X-link: pytorch/FBGEMM#4033

X-link: facebookresearch/FBGEMM#1117

Pull Request resolved: meta-pytorch#2919

# What is Key-Value Zero-Collision-Hash
Details could be found [here](https://fburl.com/oni52nmh)
In short, we want to introduce 1 to 1 mapping between embedding lookup ids(values in KJT) and embeddings. To do that we use an extremely large embedding space, e.g. 2^50, and utilize the kv embedding capability already provided by SSD TBE. Differently, we don't need to preallocate all the embeddings but allocate and deallocate while training, akak dynamic embedding.

The major functionality is provided by SSD TBE already, we need to do extra support as follows
1. optimizer offloading(is taken care of by Benson and Sarunya), since we can not pre-allocate optimizer anymore
2. update split_embedding_weight to make it return not only weights but also weight_ids and bucket.(these 2 are introduced in detailed below)
3. dram kv, a new backend solution in addition to SSD kv, this is needed for smaller model which size can be handled by inference.

NOTE: weight id is needed because the embedding id(aka embedding offset originally) is not continuous anymore, bucket is a new concept introduced specifically for tackling checkpoint/publish resharding issue). These 2 are generated every time split_embedding_weights is called, instead of member variables.

# change list
1. add bucket concept into ssd tbe
2. update split_embedding_weights to make it return a tuple of 3 tensors(weight, weight_id, id_cnt_per_bucket)
3. add new ut for the key value embedding cases
4. modify debug_optimizer_split to make it return only valid optimizer state by the weight id

Reviewed By: q10

Differential Revision: D73274786

fbshipit-source-id: c3c37bdd306f2a542c7d90e14ffdb7f96594b4df

May 5, 2025
7011587
zip
tar.gz

v2025.04.28.00

Util for getting bucket offsets (meta-pytorch#2917)

Summary:
Pull Request resolved: meta-pytorch#2917

This util is used to convert a bucketized row-wise sharded table shards to their equivalent bucket offset.
It divides the table into equal sized buckets and determines how many buckets are placed in each shard.
It returns the bucket offset of the first bucket in each shard.

Reviewed By: faran928

Differential Revision: D73397207

fbshipit-source-id: dfd83e98f8abcd60992b43ccfb9b1363ff5f5ec3

Apr 25, 2025
a5de563
zip
tar.gz

v2025.04.21.00

Fix GAUC not calculated with weights (meta-pytorch#2895)

Summary:
Pull Request resolved: meta-pytorch#2895

The gAUC score is lower than expected e.g. https://fburl.com/mlhub/vljz497c. In ig, if a label presence is false, the corresponding weight is set to 0. It should not be considered when calculating gAUC.

Reviewed By: yunjiangster

Differential Revision: D73231152

fbshipit-source-id: 3a83269948db27341cd8b6ad5d5f7b553195aa75

Apr 21, 2025
b1b7e01
zip
tar.gz

v2025.04.14.00

Update authors and publish date (meta-pytorch#2886)

Summary:
Pull Request resolved: meta-pytorch#2886

as title

Reviewed By: kausv

Differential Revision: D72924022

fbshipit-source-id: 7f8b1dcd01084ff09d3689f4582c083695ceb3fe

Apr 14, 2025
05aea06
zip
tar.gz

v2025.04.07.00

reland D70126859 (meta-pytorch#2787)

Summary:
Pull Request resolved: meta-pytorch#2787

# context
* previous diff triggered S495021
* the error message is like
```
ModelGenerationPlatformError("AttributeError: '_EmbeddingBagProxy' object has no attribute 'weight'")
```
* This is because in some flow the EBC module is fx traced so there is no actual EBC but a Proxy. Without full context it's risky to push this change.
* as a workaround, we'll just convert the unsharded EBC back to float32 so it's compatible with the input KJT.weight of float32

NOTE: this hacky change (unsharded EBC float16 ==> float32) is only needed in the tests, where we want to compare the results from sharded EBC.

WARNING: We make a strong assumption here that in any unsharded EBC (with dtype=float16) use case, the input KJT.weights should never be float32.

Reviewed By: basilwong

Differential Revision: D70712348

fbshipit-source-id: f2abaa601adf3052ea322cf326363da8bfef96c3

Apr 4, 2025
35b14b0
zip
tar.gz

v2025.03.31.00

Reset padding to default if not matching the qcoom type (meta-pytorch…

…#2772)

Summary:
Pull Request resolved: meta-pytorch#2772

Reset padding to default if not matching the qcoom type

Reviewed By: qchip

Differential Revision: D70343017

fbshipit-source-id: 0817c2a1cbb9a8edd0decde6ec73e1088d4dd114

Mar 29, 2025
71154ba
zip
tar.gz

v2025.03.24.00

Fix Pyre test on OSS (meta-pytorch#2842)

Summary:
Pull Request resolved: meta-pytorch#2842

Fixing pyre error:
````
torchrec/ir/utils.py:178:4 Incompatible return type [7]: Expected `DIM` but got `Dim`.

Reviewed By: TroyGarden

Differential Revision: D71656776

fbshipit-source-id: 3eb552d33c3be3e3fe54e2d55ad043d750cb2cc7

Mar 22, 2025
f18854f
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2025.05.26.00

v2025.05.19.00

v2025.05.12.00

v2025.05.05.00

v2025.04.28.00

v2025.04.21.00

v2025.04.14.00

v2025.04.07.00

v2025.03.31.00

v2025.03.24.00

Tags: Raahul46/torchrec