- Vietnam
-
03:56
(UTC +07:00) - in/ntdong
Starred repositories
AI Prediction api of the MusicLang package
Flexible and powerful framework for managing multiple AI agents and handling complex conversations
Official Code of Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
An open-source RAG-based tool for chatting with your documents.
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
META‑AGENTIC α‑AGI 👁️✨ — Mission 🎯 End‑to‑end: Identify 🔍 → Out‑Learn 📚 → Out‑Think 🧠 → Out‑Design 🎨 → Out‑Strategise ♟️ → Out‑Execute ⚡
A paper and project list about the cutting edge Speech Synthesis, Text-to-Speech (TTS), Singing Voice Synthesis (SVS), Voice Conversion (VC), Singing Voice Conversion (SVC), and related interesting…
Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
Production-ready platform for agentic workflow development.
A curated list of awesome LLM agents frameworks.
A curated compilation of AI-driven generative music resources and projects. Explore the blend of machine learning algorithms and musical creativity.
A curated list of resources dedicated to the safety of Large Vision-Language Models. This repository aligns with our survey titled A Survey of Safety on Large Vision-Language Models: Attacks, Defen…
A Next-Generation Training Engine Built for Ultra-Large MoE Models
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
This is the official implementation to the EMNLP 2024 paper: Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Two conversational AI agents switching from English to sound-level protocol after confirming they are both AI agents
✨✨Latest Advances on Multimodal Large Language Models
Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型