- beijing, China
- https://zhengkuntian.github.io/
Stars
[TMLR 2025] Efficient Diffusion Models: A Survey
Awesome Large Vision-Language Model: A Curated List of Large Vision-Language Model
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
Multi-Character Story Generation with Dialogue Rendering
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
China Unicom's Yuanjing Wanwu Agent Platform is an enterprise-grade, multi-tenant AI agent development platform. It helps users build applications such as intelligent agents, workflows, and rag, an…
Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.
😎 Awesome lists about all kinds of interesting topics
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Official Repository of "LLM × DATA" Survey Paper
PAFTS : Library That Preprocessing Audio For TTS.
A framework for building native applications using React
A latent text-to-image diffusion model
Latest Advances on System-2 Reasoning
🎥 Python and OpenCV-based scene cut/transition detection program & library.
DeepEP: an efficient expert-parallel communication library
A project for tri-modal LLM benchmarking and instruction tuning.
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
AI for Science 论文解读合集(持续更新ing),论文/数据集/教程下载:hyper.ai
A visuailzation tool to make deep understaning and easier debugging for RLHF training.
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
SEED-Voken: A Series of Powerful Visual Tokenizers
Collection of AWESOME vision-language models for vision tasks
Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms
A collection of benchmarks and datasets for evaluating LLM.