This is the pytorch implementation of the paper at NeurIPS 2023:
Recommender Systems with Generative Retrieval
Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H. Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, Maheswaran Sathiamoorthy.
The experimental datasets should be preprocessed into JSON format. You may refer to this example data for guidance.
python run_gr_id.py
Once the RQ-VAE model is trained, you can proceed to train the T5 model using online tokenization (i.e., tokenization is performed during training, rather than stored offline):
python run_gr_rec.py
This project is based on the LETTER repository, and is compatible with using LETTER as a tokenizer. However, unlike LETTER which removes duplicates through post-processing, our implementation introduces deduplication directly via suffix tokens during token generation.