This project introduces WeTok, a powerful discrete visual tokenizer designed to resolve the long-standing conflict between compression efficiency and reconstruction fidelity. WeTok achieves state-of-the-art reconstruction quality, surpassing previous leading discrete and continuous tokenizers.
WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
Shaobin Zhuang, Yiwei Guo, Canmiao Fu, Zhipeng Huang, Zeyue Tian, Fangyikang Wang, Ying Zhang, Chen Li, Yali Wang
Shanghai Jiao Tong University, WeChat Vision (Tencent Inc.), Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), Hong Kong University of Science and Technology, Zhejiang University, Shanghai AI Laboratory@article{zhuang2026wetok, title={WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction}, author={Zhuang, Shaobin and Guo, Yiwei and Fu, Canmiao and Huang, Zhipeng and Tian, Zeyue and Wang, Fangyikang and Zhang, Ying and Li, Chen and Wang, Yali}, journal={arXiv preprint arXiv:2508.05599}, year={2025} }
WeTok achieves a new state-of-the-art in reconstruction fidelity, surpassing both discrete and continuous tokenizers, while offering high compression ratios.
- [2025.08.31]:🚀 🚀 🚀 We have released a series of LlamaGen models that use WeTok as a tokenizer, achieving a FID of 2.31 on ImageNet, surpassing LlamaGen with Open-MAGVIT2 as visual tokenizer.
- [2025.08.12]🔥🔥🔥 We release a series of WeTok models, achieving a record-low zero-shot rFID of 0.12 on ImageNet, surpassing top continuous tokenizers like FLUX-VAE and SD-VAE 3.5.
- [2025.08.08] 🚀 🚀 🚀 We are excited to release WeTok, a powerful discrete tokenizer featuring our novel Group-Wise Lookup-Free Quantization (GQ) and a Generative Decoder (GD). Code and pretrained models are now available!
- Dependencies:
bash env.sh
- Evaluation on ImageNet 50K Validation Set
The dataset should be organized as follows:
imagenet
└── val/
├── ...
Run the 256×256 resolution evaluation script:
bash scripts/evaluation/imagenet_evaluation_256_dist.sh
Run the original resolution evaluation script:
bash scripts/evaluation/imagenet_evaluation_original_dist.sh
- Evaluation on MS-COCO Val2017
The dataset should be organized as follows:
MSCOCO2017
└── val2017/
├── ...
Run the evaluation script:
bash scripts/evaluation/mscocoval_evaluation_256_dist.sh
Run the original resolution evaluation script:
bash scripts/evaluation/mscoco_evaluation_original_dist.sh
Simply test the effect of each model reconstruction:
bash scripts/inference/reconstruct_image.sh
Qualitative comparison of 512 × 512 image reconstruction on TokBench.
WeTok-AR-XL generated samples at 256 × 256 resolution.
Our work builds upon the foundations laid by many excellent projects in the field. We would like to thank the authors of Open-MAGVIT2. We also drew inspiration from the methodologies presented in LFQ, BSQ. We are grateful for their contributions to the community.