jokerwyt

🚧

Working

Yongtong Wu jokerwyt

🚧

Working

CS Ph.D. Student in System Software Research. @pkusys @deepseek-ai

65 followers · 53 following

https://jokerwyt.github.io/

Achievements

Highlights

Lists (4)

Sort

🕍 Distributed System

5 repositories

🍵 Magic tech tools

1 repository

⭐ Miscellaneous

📎 Paper AE

3 repositories

Starred repositories

PyO3 / pyo3

Rust bindings for the Python interpreter

Rust 15,195 925 Updated Jan 19, 2026

Anduin2017 / HowToCook

程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).

Dockerfile 97,238 10,771 Updated Jan 19, 2026

perplexityai / pplx-garden

Perplexity open source garden for inference technology

Rust 343 28 Updated Dec 25, 2025

funstory-ai / BabelDOC

Yet Another Document Translator

Python 7,506 588 Updated Jan 16, 2026

pizlonator / fil-c

Forked from llvm/llvm-project

Fil-C: completely compatible memory safety for C and C++

2,899 57 Updated Jan 21, 2026

deepseek-ai / DeepSeek-V3

Python 101,251 16,480 Updated Aug 28, 2025

SaladDay / clash-for-lab

⚡ Clash for Lab 是为实验室环境设计的科学上网工具，无需sudo权限，优雅地一键式脚本安装

Shell 259 12 Updated Dec 11, 2025

geekan / HowToLiveLonger

程序员延寿指南 | A programmer's guide to live longer

34,704 2,375 Updated May 19, 2025

MLSysU / TD-Pipe

A Throughput-Optimized Pipeline Parallel Inference System for Large Language Models

Python 46 2 Updated Dec 24, 2025

stepfun-ai / StepMesh

C++ 340 33 Updated Jan 4, 2026

sgl-project / SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 642 143 Updated Jan 22, 2026

OpenDCAI / DataFlow

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

Python 2,577 166 Updated Jan 22, 2026

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,908 1,069 Updated Jan 20, 2026

grgalex / nvshare

Practical GPU Sharing Without Memory Size Constraints

C 299 32 Updated Mar 28, 2025

NVIDIA / nccl

Optimized primitives for collective multi-GPU communication

C++ 4,398 1,117 Updated Jan 9, 2026

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 4,783 407 Updated Jan 21, 2026

snowflakedb / ArcticInference

ArcticInference: vLLM plugin for high-throughput, low-latency inference

Python 377 46 Updated Jan 21, 2026

perkfly / reverse-interview-zh

技术面试最后反问面试官的话

18,373 1,386 Updated Mar 4, 2024

NVlabs / Fast-dLLM

Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"

Python 797 86 Updated Nov 28, 2025

infinigence / FlashOverlap

A lightweight design for computation-communication overlap.

Python 212 10 Updated Jan 20, 2026

rapidsai / ucx-py

Python bindings for UCX

Python 140 64 Updated Sep 18, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,096 229 Updated Jan 17, 2026

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 554 75 Updated Nov 7, 2025

openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)

C 1,552 514 Updated Jan 22, 2026

ai-dynamo / nixl

NVIDIA Inference Xfer Library (NIXL)

C++ 830 227 Updated Jan 22, 2026

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,614 522 Updated Jan 22, 2026

hyx1999 / SAM-Decoding

Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton

Python 39 1 Updated Feb 13, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,226 89 Updated Aug 28, 2025

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,812 792 Updated Jan 22, 2026

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 18,210 2,516 Updated Jan 22, 2026