Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View kairos-yu's full-sized avatar
  • 20:39 (UTC -12:00)

Block or report kairos-yu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A repo for llm on ncnn

C++ 178 21 Updated Jan 2, 2026

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 5,025 609 Updated Mar 23, 2025

Elana: A Simple Energy & Latency Analyzer for LLMs

Python 13 1 Updated Dec 12, 2025

To better understand the ggml library

C++ 26 7 Updated Jun 13, 2025

run AI models on qualcomm chips, sd, llm, vlm

C++ 8 Updated Feb 21, 2025

Low overhead tracing library and trace visualizer for pipelined CUDA kernels

C 129 5 Updated Nov 26, 2025
Python 114 8 Updated Sep 22, 2025

Static suckless single batch CUDA-only qwen3-0.6B mini inference engine

Cuda 539 48 Updated Sep 8, 2025

Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Su…

Go 7,520 940 Updated Jan 16, 2026

The Python programming language

Python 71,106 33,911 Updated Jan 17, 2026
Python 22 7 Updated Jul 20, 2025

🤗A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.

Python 900 51 Updated Jan 17, 2026

LLM inference in C/C++

C++ 93,121 14,507 Updated Jan 17, 2026

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,597 2,020 Updated Jan 13, 2026

🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.

Cuda 245 12 Updated Nov 18, 2025

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,221 178 Updated Jul 29, 2023

A paper list of some recent works about Token Compress for Vit and VLM

810 38 Updated Dec 24, 2025

Memray is a memory profiler for Python

Python 14,764 431 Updated Jan 6, 2026

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 635 136 Updated Jan 17, 2026

Use your Mac trackpad as a weighing scale

Swift 8,542 367 Updated Jul 27, 2025

SGLang kernel library for NPU

C++ 93 73 Updated Jan 17, 2026

Open Machine Learning Compiler Framework

Python 13,034 3,763 Updated Jan 17, 2026
Python 171 89 Updated Dec 31, 2025

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Python 3,603 215 Updated Jan 14, 2026

Technical report of Kimina-Prover Preview.

Python 349 17 Updated Jul 10, 2025

比做算法的懂工程落地,比做工程的懂算法模型。

Jupyter Notebook 255 39 Updated May 26, 2025

Allow torch tensor memory to be released and resumed later

Python 202 34 Updated Jan 13, 2026
Next