yynj26

Yue Yang yynj26

2 followers · 7 following

Bay area, CA
13:11 (UTC -07:00)

Achievements

Lists (1)

Sort

Reading List

4 repositories

Stars

zhengjingwei / machine-learning-interview

算法工程师-机器学习面试题总结

1,608 216 Updated Sep 26, 2019

gpustack / gpustack

Simple, scalable AI model deployment on GPU clusters

Python 3,897 392 Updated Oct 23, 2025

HuaizhengZhang / AI-Infra-from-Zero-to-Hero

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…

3,345 342 Updated Jul 25, 2025

jihoo-kim / awesome-RecSys

A curated list of awesome Recommender System (Books, Conferences, Researchers, Papers, Github Repositories, Useful Sites, Youtube Videos)

1,400 208 Updated Feb 13, 2022

mlip-cmu / book

Sources for the book "Machine Learning in Production"

CSS 135 36 Updated Jul 19, 2025

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,906 141 Updated Oct 23, 2025

serverless / serverless

⚡ Serverless Framework – Effortlessly build apps that auto-scale, incur zero costs when idle, and require minimal maintenance using AWS Lambda and other managed cloud services.

JavaScript 46,884 5,745 Updated Oct 22, 2025

MuLabPKU / TransMLA

TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)

Python 391 22 Updated Sep 23, 2025

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 27,942 3,247 Updated Jun 26, 2025

KellerJordan / modded-nanogpt

NanoGPT (124M) in 3 minutes

Python 3,677 472 Updated Oct 16, 2025

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 47,665 7,994 Updated Dec 9, 2024

stanford-cs336 / spring2025-lectures

Python 1,616 361 Updated Oct 7, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 7,184 921 Updated Aug 31, 2025

skyzh / tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,351 224 Updated Oct 12, 2025

stas00 / ml-engineering

Machine Learning Engineering Open Book

Python 15,489 941 Updated Oct 21, 2025

llm-d / llm-d

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 1,919 205 Updated Oct 23, 2025

eyaltoledano / claude-task-master

An AI-powered task-management system you can drop into Cursor, Lovable, Windsurf, Roo, and others.

JavaScript 23,156 2,250 Updated Oct 23, 2025

PatrickJS / awesome-cursorrules

📄 Configuration files that enhance Cursor AI editor experience with custom rules and behaviors

MDX 34,789 2,955 Updated Sep 24, 2025

Inference-Engine-Arena / inference-engine-arena

Postman & Chatbot Arena for inference benchmarking.

Python 14 Updated Jun 19, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,110 143 Updated Mar 21, 2025

lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 39,178 4,765 Updated Jun 2, 2025

NexaAI / nexa-sdk

Run the latest LLMs and VLMs across GPU, NPU, and CPU with PC (Python/C++) & mobile (Android & iOS) support, running quickly with OpenAI gpt-oss, Granite4, Qwen3VL, Gemma 3n and more.

Go 5,472 708 Updated Oct 23, 2025

firstbatchxyz / dria-sdk

Dria SDK is for building and executing synthetic data generation pipelines on Dria Knowledge Network.

Python 28 7 Updated Apr 3, 2025

Mobile-Artificial-Intelligence / maid

Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.

Dart 2,164 223 Updated Jul 28, 2025

ggml-org / llama.cpp

LLM inference in C/C++

C++ 88,226 13,416 Updated Oct 23, 2025

menloresearch / jan

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer.

TypeScript 38,311 2,306 Updated Oct 23, 2025

mit-han-lab / torchsparse

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

Cuda 1,404 177 Updated Feb 24, 2025

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 3,320 276 Updated Jul 17, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 11,940 1,818 Updated Oct 23, 2025

zai-org / GLM-Edge

GLM Series Edge Models

Python 150 12 Updated Jun 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yue Yang yynj26

Achievements

Achievements

Block or report yynj26

Lists (1)

Reading List

Stars

zhengjingwei / machine-learning-interview

gpustack / gpustack

HuaizhengZhang / AI-Infra-from-Zero-to-Hero

jihoo-kim / awesome-RecSys

mlip-cmu / book

mirage-project / mirage

serverless / serverless

MuLabPKU / TransMLA

karpathy / llm.c

KellerJordan / modded-nanogpt

karpathy / nanoGPT

stanford-cs336 / spring2025-lectures

GeeeekExplorer / nano-vllm

skyzh / tiny-llm

stas00 / ml-engineering

llm-d / llm-d

eyaltoledano / claude-task-master

PatrickJS / awesome-cursorrules

Inference-Engine-Arena / inference-engine-arena

deepseek-ai / profile-data

lm-sys / FastChat

NexaAI / nexa-sdk

firstbatchxyz / dria-sdk

Mobile-Artificial-Intelligence / maid

ggml-org / llama.cpp

menloresearch / jan

mit-han-lab / torchsparse

mit-han-lab / llm-awq

NVIDIA / TensorRT-LLM

zai-org / GLM-Edge