Thanks to visit codestin.com
Credit goes to github.com

kairos-yu

Follow

Kairos Yu kairos-yu

Follow

2 followers · 29 following

20:39 (UTC -12:00)

Achievements

Achievements

Stars

futz12 / ncnn_llm

A repo for llm on ncnn

C++ 178 21 Updated Jan 2, 2026

openvla / openvla

Forked from TRI-ML/prismatic-vlms

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 5,025 609 Updated Mar 23, 2025

jeho-lee / Awesome-On-Device-AI-Systems

110 3 Updated Dec 31, 2025

enyac-group / Elana

Elana: A Simple Energy & Latency Analyzer for LLMs

Python 13 1 Updated Dec 12, 2025

Yangxiaoz / GGML-Tutorial

To better understand the ggml library

C++ 26 7 Updated Jun 13, 2025

chenjun2hao / qualcomm.ai

run AI models on qualcomm chips, sd, llm, vlm

C++ 8 Updated Feb 21, 2025

aikitoria / nanotrace

Low overhead tracing library and trace visualizer for pipelined CUDA kernels

C 129 5 Updated Nov 26, 2025

cfregly / ai-performance-engineering

Python 934 126 Updated Jan 16, 2026

Starmys / TritonStudyGroup

Python 114 8 Updated Sep 22, 2025

yassa9 / qwen600

Static suckless single batch CUDA-only qwen3-0.6B mini inference engine

Cuda 539 48 Updated Sep 8, 2025

NexaAI / nexa-sdk

Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Su…

Go 7,520 940 Updated Jan 16, 2026

python / cpython

The Python programming language

Python 71,106 33,911 Updated Jan 17, 2026

JT-Ushio / AI-Infra-Seminar

Python 22 7 Updated Jul 20, 2025

vipshop / cache-dit

🤗A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.

Python 900 51 Updated Jan 17, 2026

ggml-org / llama.cpp

LLM inference in C/C++

C++ 93,121 14,507 Updated Jan 17, 2026

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,597 2,020 Updated Jan 13, 2026

xlite-dev / ffpa-attn

🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.

Cuda 245 12 Updated Nov 18, 2025

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,221 178 Updated Jul 29, 2023

daixiangzi / Awesome-Token-Compress

A paper list of some recent works about Token Compress for Vit and VLM

810 38 Updated Dec 24, 2025

bloomberg / memray

Memray is a memory profiler for Python

Python 14,764 431 Updated Jan 6, 2026

sgl-project / SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 635 136 Updated Jan 17, 2026

KrishKrosh / TrackWeight

Use your Mac trackpad as a weighing scale

Swift 8,542 367 Updated Jul 27, 2025

sgl-project / sgl-kernel-npu

SGLang kernel library for NPU

C++ 93 73 Updated Jan 17, 2026

apache / tvm

Open Machine Learning Compiler Framework

Python 13,034 3,763 Updated Jan 17, 2026

mlc-ai / relax

Python 171 89 Updated Dec 31, 2025

nunchaku-ai / nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Python 3,603 215 Updated Jan 14, 2026

MoonshotAI / Kimina-Prover-Preview

Technical report of Kimina-Prover Preview.

Python 349 17 Updated Jul 10, 2025

JackonYang / paper-reading

比做算法的懂工程落地，比做工程的懂算法模型。

Jupyter Notebook 255 39 Updated May 26, 2025

knemik97 / Manifesto-against-the-Plagiarist-Yunhe-Wang

讨贼王云鹤檄文

1,102 113 Updated Jul 8, 2025

fzyzcjy / torch_memory_saver

Allow torch tensor memory to be released and resumed later

Python 202 34 Updated Jan 13, 2026