Starred repositories
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
VietASR - Vietnamese Automatic Speech Recognition
[NeurIPS'22] Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
FSA/FST algorithms, differentiable, with PyTorch compatibility.
A Configurable template for a FastAPI application, with Authentication, User integration, Admin pages and a snappy CLI to control it all!
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
Speech-to-text server framework with next-gen Kaldi
SGLang is a fast serving framework for large language models and vision language models.
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
Port of OpenAI's Whisper model in C/C++
A mcp server to allow LLMS gain context about shadcn ui component structure,usage and installation,compaitable with react,svelte 5,and vue
TTS Dia finetuning for Vietnamese
ViStreamASR - Real-Time Vietnamese Speech Recognition
The training program for libfacedetection for face detection and 5-landmark detection.
An open source library for face detection in images. The face detection speed can reach 1000FPS.
Expose your FastAPI endpoints as Model Context Protocol (MCP) tools, with Auth!
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus Agent Tools, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae…
MCP server that interacts with Obsidian via the Obsidian rest API community plugin
[IJCV 2025] Smaller But Better: Unifying Layout Generation with Smaller Large Language Models
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
Get started with building Fullstack Agents using Gemini 2.5 and LangGraph
TradingAgents: Multi-Agents LLM Financial Trading Framework
Declaratively deploy your Kubernetes manifests, Kustomize configs, and Charts as Helm releases. Generate all-in-one manifests for use with ArgoCD.
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.