This repository contains the official implementation of the paper:
Unity: Fully Self-Supervised Pretraining with Transformers for Recommendation Shuang Yang, Yang Yang, Tao Liu, Feng Qi, Kaushik Rangadurai, Luke Simon, Sandeep Pandey
We present Unity, a fully self-supervised learning framework for recommendation. Unity integrates three key components: 1) Unity tokenization, an event-level tokenizer that converts heterogeneous engagement features into a single sequence of compact latent tokens; 2) Pollen, a Transformer architecture designed to program arbitrary interactions in the exponential-polynomial family; and 3) a masked-language model-style self-supervised learning paradigm. We evaluated the framework extensively in both production and public settings and report promising results including multiple production launches that have landed significant topline business metric gains in both organic and ads recommendation. Our early scaling experiments also demonstrate 2.3x-4.9x higher scaling efficiency compared to previously established scaling laws.
Unity is a plug-in self-supervised pretraining module for recommendation models. It attaches a Masked Autoencoder (MAE) to the intermediate representations of any backbone model, adding a self-supervised reconstruction objective alongside the standard task loss. The framework includes:
- Unity MAE: A 1D masked autoencoder that patchifies intermediate representations, masks a subset, and reconstructs them through an encoder-decoder Transformer.
- Pollen: A geometric mean attention mechanism that replaces standard softmax attention, enabling interactions in the exponential-polynomial family.
- Backbone integration: Demonstrated with two backbone models (WuKong and DCNv2) on public CTR prediction benchmarks.
.
├── run_expid.py # Main training entry point
├── common/
│ ├── base_model.py # Base model class with training loop
│ └── unity_mae.py # Unity MAE, Pollen, and Transformer blocks
├── WuKong/
│ ├── config/
│ │ ├── model_config.yaml # WuKong model hyperparameters
│ │ └── dataset_config.yaml # Dataset paths and feature definitions
│ └── src/
│ └── WuKong.py # WuKong backbone model
└── DCNv2/
├── config/
│ ├── model_config.yaml # DCNv2 model hyperparameters
│ └── dataset_config.yaml # Dataset paths and feature definitions
└── src/
└── DCNv2.py # DCNv2 backbone model
- Python 3.8+
- PyTorch
- FuxiCTR (for data loading, feature processing, and metrics)
- NumPy
- tqdm
Install dependencies:
pip install torch numpy tqdm
pip install fuxictrThis codebase supports datasets in both CSV and NPZ formats. Download the datasets and place them under ~/datasets/ so the directory structure looks like:
~/datasets/
├── KuaiVideo_x1/
│ ├── train.csv
│ ├── test.csv
│ └── item_visual_emb_dim64.h5
├── TaobaoAd_x1/
│ ├── train.csv
│ └── test.csv
└── AmazonElectronics_x1/
├── train.csv
└── test.csv
The config files use /home/USER/datasets/ as a placeholder. At runtime, USER is automatically replaced with your system username ($USER), so no manual path editing is needed as long as datasets are placed under ~/datasets/.
Public benchmark datasets used in the paper:
PYTHONPATH=.:$PYTHONPATH python run_expid.py \
--config WuKong/config \
--src WuKong.src \
--expid WuKong_test \
--gpu 0PYTHONPATH=.:$PYTHONPATH python run_expid.py \
--config DCNv2/config \
--src DCNv2.src \
--expid DCNv2_test \
--gpu 0Model hyperparameters are defined in model_config.yaml. Key Unity-specific parameters:
| Parameter | Description | Default |
|---|---|---|
enabled |
Enable the Unity MAE module | True |
output_mode |
How encoder output is combined with input (0=input, 1=encoder, 2=concat, 3=add) | 1 |
patch_size |
Size of each patch for the 1D patchification | 32 |
embed_dim |
Embedding dimension of the MAE encoder | 16 |
depth |
Number of encoder Transformer blocks | 2 |
num_heads |
Number of attention heads in the encoder | 2 |
decoder_embed_dim |
Embedding dimension of the MAE decoder | 128 |
decoder_depth |
Number of decoder Transformer blocks | 2 |
decoder_num_heads |
Number of attention heads in the decoder | 4 |
mask_ratio |
Fraction of patches to mask during training | 0.5 |
loss_weight |
Weight of the MAE reconstruction loss | 0.05 |
pollen_attn_type |
Set to "pollen" to use Pollen attention (null for standard MHA) |
"pollen" |
pollen_use_value_sign |
Enable value sign in geometric mean attention | True |
pollen_use_rmsnorm |
Use RMSNorm instead of LayerNorm in Pollen blocks | False |
Unity is designed as a drop-in module. To add it to a new backbone:
- Have your model inherit from
BaseModelincommon/base_model.py. - Create a
UnityMAEConfigand instantiateUnityMAEin your model's__init__. - In
forward(), pass intermediate representations through the Unity module:
from common.unity_mae import UnityMAE, UnityMAEConfig
# In __init__:
self.unity_config = UnityMAEConfig(
raw_input_size=feature_dim,
input_size=feature_dim,
output_mode=1,
patch_size=32,
embed_dim=16,
depth=2,
num_heads=2,
decoder_embed_dim=128,
decoder_depth=2,
decoder_num_heads=4,
mlp_ratio=4.0,
mask_ratio=0.5,
pollen_attn_type="pollen",
pollen_use_value_sign=True,
)
self.unity = UnityMAE(self.unity_config)
# In forward():
final_out, unity_loss = self.unity(final_out)
unity_loss = self.aggregate_unity_loss(self.unity_config, unity_loss)The unity_loss is automatically combined with the task loss in BaseModel.train_step().
@article{yang2025unity,
title={Unity: Fully Self-Supervised Pretraining with Transformers for Recommendation},
author={Yang, Shuang and Yang, Yang and Liu, Tao and Qi, Feng and Rangadurai, Kaushik and Simon, Luke and Pandey, Sandeep},
year={2025}
}This project is licensed under the Apache License 2.0. See individual source files for details.