🚀 WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction

This project introduces WeTok, a powerful discrete visual tokenizer designed to resolve the long-standing conflict between compression efficiency and reconstruction fidelity. WeTok achieves state-of-the-art reconstruction quality, surpassing previous leading discrete and continuous tokenizers.

WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
Shaobin Zhuang, Yiwei Guo, Canmiao Fu, Zhipeng Huang, Zeyue Tian, Fangyikang Wang, Ying Zhang, Chen Li, Yali Wang
Shanghai Jiao Tong University, WeChat Vision (Tencent Inc.), Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), Hong Kong University of Science and Technology, Zhejiang University, Shanghai AI Laboratory
@article{zhuang2026wetok,
  title={WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction},
  author={Zhuang, Shaobin and Guo, Yiwei and Fu, Canmiao and Huang, Zhipeng and Tian, Zeyue and Wang, Fangyikang and Zhang, Ying and Li, Chen and Wang, Yali},
  journal={arXiv preprint arXiv:2508.05599},
  year={2025}
}

WeTok achieves a new state-of-the-art in reconstruction fidelity, surpassing both discrete and continuous tokenizers, while offering high compression ratios.

📰 News

[2025.08.31]:🚀 🚀 🚀 We have released a series of LlamaGen models that use WeTok as a tokenizer, achieving a FID of 2.31 on ImageNet, surpassing LlamaGen with Open-MAGVIT2 as visual tokenizer.
[2025.08.12]🔥🔥🔥 We release a series of WeTok models, achieving a record-low zero-shot rFID of 0.12 on ImageNet, surpassing top continuous tokenizers like FLUX-VAE and SD-VAE 3.5.
[2025.08.08] 🚀 🚀 🚀 We are excited to release WeTok, a powerful discrete tokenizer featuring our novel Group-Wise Lookup-Free Quantization (GQ) and a Generative Decoder (GD). Code and pretrained models are now available!

📖 Implementations

🛠️ Installation

Dependencies:

bash env.sh

Evaluation

Evaluation on ImageNet 50K Validation Set

The dataset should be organized as follows:

imagenet
└── val/
    ├── ...

Run the 256×256 resolution evaluation script:

bash scripts/evaluation/imagenet_evaluation_256_dist.sh

Run the original resolution evaluation script:

bash scripts/evaluation/imagenet_evaluation_original_dist.sh

Evaluation on MS-COCO Val2017

The dataset should be organized as follows:

MSCOCO2017
└── val2017/
    ├── ...

Run the evaluation script:

bash scripts/evaluation/mscocoval_evaluation_256_dist.sh

Run the original resolution evaluation script:

bash scripts/evaluation/mscoco_evaluation_original_dist.sh

Inference

Simply test the effect of each model reconstruction:

bash scripts/inference/reconstruct_image.sh

Qualitative comparison of 512 × 512 image reconstruction on TokBench.

WeTok-AR-XL generated samples at 256 × 256 resolution.

❤️ Acknowledgement

Our work builds upon the foundations laid by many excellent projects in the field. We would like to thank the authors of Open-MAGVIT2. We also drew inspiration from the methodologies presented in LFQ, BSQ. We are grateful for their contributions to the community.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
configs/WeToK		configs/WeToK
metrics		metrics
scripts		scripts
src		src
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
combine_npz.py		combine_npz.py
env.sh		env.sh
evaluation_image_ddp.py		evaluation_image_ddp.py
evaluation_original_reso_dist.py		evaluation_original_reso_dist.py
main.py		main.py
reconstruct_image.py		reconstruct_image.py
sample.py		sample.py
sample_evaluator.py		sample_evaluator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction

📰 News

📖 Implementations

🛠️ Installation

Evaluation

Inference

❤️ Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

zhuangshaobin/WeTok

Folders and files

Latest commit

History

Repository files navigation

🚀 WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction

📰 News

📖 Implementations

🛠️ Installation

Evaluation

Inference

❤️ Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages