Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View yynj26's full-sized avatar
  • Bay area, CA
  • 09:55 (UTC -08:00)

Block or report yynj26

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
41 results for source starred repositories
Clear filter

算法工程师-机器学习面试题总结

1,613 216 Updated Sep 26, 2019

Simple, scalable AI model deployment on GPU clusters

Python 3,945 397 Updated Nov 4, 2025

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…

3,370 344 Updated Jul 25, 2025

A curated list of awesome Recommender System (Books, Conferences, Researchers, Papers, Github Repositories, Useful Sites, Youtube Videos)

1,402 208 Updated Feb 13, 2022

Sources for the book "Machine Learning in Production"

CSS 136 37 Updated Jul 19, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,933 148 Updated Nov 4, 2025

⚡ Serverless Framework – Effortlessly build apps that auto-scale, incur zero costs when idle, and require minimal maintenance using AWS Lambda and other managed cloud services.

JavaScript 46,890 5,744 Updated Oct 22, 2025

TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)

Python 396 22 Updated Sep 23, 2025

LLM training in simple, raw C/CUDA

Cuda 28,043 3,259 Updated Jun 26, 2025

NanoGPT (124M) in 3 minutes

Python 3,764 488 Updated Oct 29, 2025

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 48,902 8,185 Updated Dec 9, 2024

Nano vLLM

Python 8,142 1,004 Updated Nov 3, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,383 227 Updated Nov 2, 2025

Machine Learning Engineering Open Book

Python 15,597 956 Updated Oct 27, 2025

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 1,968 217 Updated Nov 4, 2025

An AI-powered task-management system you can drop into Cursor, Lovable, Windsurf, Roo, and others.

JavaScript 23,454 2,285 Updated Nov 4, 2025

📄 Configuration files that enhance Cursor AI editor experience with custom rules and behaviors

MDX 35,118 2,979 Updated Oct 24, 2025

Postman & Chatbot Arena for inference benchmarking.

Python 14 Updated Jun 19, 2025

Analyze computation-communication overlap in V3/R1.

1,112 143 Updated Mar 21, 2025

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 39,211 4,766 Updated Jun 2, 2025

Run the latest LLMs and VLMs across GPU, NPU, and CPU with PC (Python/C++) & mobile (Android & iOS) support, running quickly with OpenAI gpt-oss, Granite4, Qwen3VL, Gemma 3n and more.

Go 5,592 721 Updated Nov 4, 2025

Dria SDK is for building and executing synthetic data generation pipelines on Dria Knowledge Network.

Python 29 7 Updated Apr 3, 2025

Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.

Dart 2,187 224 Updated Jul 28, 2025

LLM inference in C/C++

C++ 88,723 13,513 Updated Nov 4, 2025

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer.

TypeScript 39,113 2,361 Updated Nov 4, 2025

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

Cuda 1,409 177 Updated Feb 24, 2025

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 3,331 277 Updated Jul 17, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,031 1,834 Updated Nov 4, 2025

GLM Series Edge Models

Python 152 14 Updated Jun 12, 2025
Next