Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View xiami2019's full-sized avatar
😈
Focus 酬勤
😈
Focus 酬勤

Organizations

@OpenMOSS

Block or report xiami2019

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Post-training with Tinker

Python 1,275 86 Updated Oct 28, 2025

FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.

Python 183 17 Updated Aug 14, 2025

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 315 26 Updated Oct 28, 2025

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

Python 410 11 Updated Oct 29, 2025

Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.

Python 497 28 Updated Aug 14, 2025

Sparser Block-Sparse Attention via Token Permutation

Python 22 Updated Oct 27, 2025

Official code for"DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation"

Python 157 12 Updated Oct 20, 2025

Code for the blog "Neural audio codecs: how to get audio into LLMs"

Python 106 2 Updated Oct 20, 2025

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

Python 342 35 Updated Oct 27, 2025

Finetune Sesame AI's conversational speech model on new languages and voices. Blog post: https://blog.speechmatics.com/sesame-finetune

Python 89 9 Updated Sep 27, 2025

Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping" by Zhiheng Xi et al.

Python 71 2 Updated Oct 25, 2025

Trainging, inference, and testing of the SAC speech codec model.

Python 76 4 Updated Oct 24, 2025

Music Benchmark

Python 2 Updated Oct 23, 2025

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 2,061 117 Updated Jul 29, 2024
Python 136 7 Updated Jul 31, 2025
Python 19 3 Updated Sep 26, 2025

The TTSDS benchmark evaluates synthetic speech quality by considering prosody, speaker identity, and intelligibility, comparing these factors with real speech and noise datasets.

Python 66 5 Updated Sep 29, 2025

[TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

Python 304 18 Updated Dec 25, 2024

NEO Series: Native Vision-Language Models from First Principles

Python 205 11 Updated Oct 21, 2025

PyTorch-native post-training at scale

Python 451 45 Updated Oct 29, 2025
Python 162 15 Updated Oct 28, 2025

PyTorch tutorials.

Python 8,860 4,284 Updated Oct 26, 2025

Public repository for Skills

Python 14,324 1,130 Updated Oct 18, 2025

Declarative Intent Driven Platform Orchestrator for Internal Developer Platform (IDP).

Go 1,181 94 Updated Aug 28, 2025

This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…

Python 49 1 Updated Sep 21, 2025

Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.

Python 165 3 Updated Oct 12, 2025

LongCat Audio Tokenizer and Detokenizer

Python 184 11 Updated Oct 20, 2025

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)

Python 295 37 Updated Oct 14, 2025
Next