Stars
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundation visual backbones.
[ICML 2024] PyTorch implementation for "Diversified Batch Selection for Training Acceleration"
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
A pytorch implementation of Fourier Analysis Networks (FAN)
[ICCV 2023] Code base for Revisiting Scene Text Recognition: A Data Perspective
Official repository of InLine attention (NeurIPS 2024)
[AAAI 2021] Pytorch implementation for "Tied Block Convolution: Leaner and Better CNNs with Shared Thinner Filters."
Locality-Aware Non-Maximum Suppression (C++ version)
This is an unofficial PyTorch re-implementation of paper "Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network" published in ICCV 2019.
A unofficial pytorch implementation of PAN(PSENet2): Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network