Thanks to visit codestin.com
Credit goes to github.com

Skip to content

zhuangshaobin/WeTok

Repository files navigation

🚀 WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction

arXiv Github Hugging Face Model

This project introduces WeTok, a powerful discrete visual tokenizer designed to resolve the long-standing conflict between compression efficiency and reconstruction fidelity. WeTok achieves state-of-the-art reconstruction quality, surpassing previous leading discrete and continuous tokenizers.

WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
Shaobin Zhuang, Yiwei Guo, Canmiao Fu, Zhipeng Huang, Zeyue Tian, Fangyikang Wang, Ying Zhang, Chen Li, Yali Wang
Shanghai Jiao Tong University, WeChat Vision (Tencent Inc.), Shenzhen Institutes of Advanced Technology (Chinese Academy of Sciences), Hong Kong University of Science and Technology, Zhejiang University, Shanghai AI Laboratory

@article{zhuang2026wetok,
  title={WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction},
  author={Zhuang, Shaobin and Guo, Yiwei and Fu, Canmiao and Huang, Zhipeng and Tian, Zeyue and Wang, Fangyikang and Zhang, Ying and Li, Chen and Wang, Yali},
  journal={arXiv preprint arXiv:2508.05599},
  year={2025}
}


WeTok achieves a new state-of-the-art in reconstruction fidelity, surpassing both discrete and continuous tokenizers, while offering high compression ratios.

📰 News

  • [2025.08.31]:🚀 🚀 🚀 We have released a series of LlamaGen models that use WeTok as a tokenizer, achieving a FID of 2.31 on ImageNet, surpassing LlamaGen with Open-MAGVIT2 as visual tokenizer.
  • [2025.08.12]🔥🔥🔥 We release a series of WeTok models, achieving a record-low zero-shot rFID of 0.12 on ImageNet, surpassing top continuous tokenizers like FLUX-VAE and SD-VAE 3.5.
  • [2025.08.08] 🚀 🚀 🚀 We are excited to release WeTok, a powerful discrete tokenizer featuring our novel Group-Wise Lookup-Free Quantization (GQ) and a Generative Decoder (GD). Code and pretrained models are now available!

📖 Implementations

🛠️ Installation

  • Dependencies:
bash env.sh

Evaluation

  • Evaluation on ImageNet 50K Validation Set

The dataset should be organized as follows:

imagenet
└── val/
    ├── ...

Run the 256×256 resolution evaluation script:

bash scripts/evaluation/imagenet_evaluation_256_dist.sh

Run the original resolution evaluation script:

bash scripts/evaluation/imagenet_evaluation_original_dist.sh
  • Evaluation on MS-COCO Val2017

The dataset should be organized as follows:

MSCOCO2017
└── val2017/
    ├── ...

Run the evaluation script:

bash scripts/evaluation/mscocoval_evaluation_256_dist.sh

Run the original resolution evaluation script:

bash scripts/evaluation/mscoco_evaluation_original_dist.sh

Inference

Simply test the effect of each model reconstruction:

bash scripts/inference/reconstruct_image.sh


Qualitative comparison of 512 × 512 image reconstruction on TokBench.


WeTok-AR-XL generated samples at 256 × 256 resolution.

❤️ Acknowledgement

Our work builds upon the foundations laid by many excellent projects in the field. We would like to thank the authors of Open-MAGVIT2. We also drew inspiration from the methodologies presented in LFQ, BSQ. We are grateful for their contributions to the community.

About

WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published