-
Qianxin
Starred repositories
Multimodal-Composite-Editing-and-Retrieval-update
Accelerating AI Training and Inference from Storage Perspective (Must-read Papers on Storage for AI)
System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…
On the Theoretical Limitations of Embedding-Based Retrieval
This repository contains implementations and illustrative code to accompany DeepMind publications
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
A modern GUI client based on Tauri, designed to run in Windows, macOS and Linux for tailored proxy experience
An open-source AI agent that brings the power of Gemini directly into your terminal.
shuoranliu / curvine
Forked from CurvineIO/curvineHigh performance distributed cache system. Built by Rust.
magic-trace collects and displays high-resolution traces of what a process is doing
cxl-micron-reskit / famfs
Forked from jagalactic/famfsThis is the user space repo for famfs, the fabric-attached memory file system
Rule Snippet & Rule Set for Surge / Mihomo (Clash.Meta) / Clash Premium (Dreamacro) / sing-box / Surfboard for Android / Stash
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Ongoing research training transformer models at scale
A high-throughput and memory-efficient inference and serving engine for LLMs
[EMNLP 2025] Circuit-Aware Editing Enables Generalizable Knowledge Learners
FireFlyer Record file format, writer and reader for DL training samples.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient Multi-head Latent Attention Kernels
From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓