- Hangzhou
- https://kd-tao.github.io/
Stars
Elevate your AI research writing, no more tedious polishing ✨
Awesome streaming video understanding
[EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
slime is an LLM post-training framework for RL Scaling.
Autonomous Agents (LLMs) research papers. Updated Daily.
OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding
[NeurIPS'25] FreqExit: Enabling Early-Exit Inference for Visual Autoregressive Models via Frequency-Aware Guidance
KD-TAO / KD-TAO.github.io
Forked from tangjyan/tangjyan.github.ioAcademic Personal Homepage of Keda Tao
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
This repo integrates DyCoke's token compression method with VLMs such as Gemma3 and InternVL3
[TMLR 2026] Survey: https://arxiv.org/pdf/2507.20198
All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.
[ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
VidKV: Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
[CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
[ICLR 2025 Spotlight] Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model
face attribute classification based on pytorch
这是一个arcface-pytorch的源码,可以用于训练自己的模型。