Stars
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
Pioneering in Vietnamese Multimodal Large Language Model
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
OpenChat: Advancing Open-source Language Models with Imperfect Data
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
Crawl a site to generate knowledge files to create your own custom GPT from a URL
PhoGPT: Generative Pre-training for Vietnamese (2023)
State-of-the-Art Text Embeddings
Anti-DreamBooth: Protecting users from personalized text-to-image synthesis (ICCV 2023)
Official Pytorch Implementation of the paper: Wavelet Diffusion Models are fast and scalable Image Generators (CVPR'23)
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)
PhoNLP: A BERT-based multi-task learning model for part-of-speech tagging, named entity recognition and dependency parsing (NAACL 2021)
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
An unofficial PyTorch implementation of the audio LM VALL-E
Code and documentation to train Stanford's Alpaca models, and generate the data.
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation (INTERSPEECH 2022)
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Source Code for IJCAI 2020 paper "On the Importance of Word and Sentence Representation Learning in Implicit Discourse Relation Classification"
PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation (EMNLP 2021)