chiro2001

💭

I may be slow to respond.

Chiro Liang chiro2001

💭

I may be slow to respond.

neet.

146 followers · 165 following

Achievements

Organizations

Lists (2)

Sort

machine-learning

2 repositories

prefetchers

16 repositories

Stars

marian-nmt / marian

Fast Neural Machine Translation in C++

C++ 1,413 243 Updated Aug 25, 2023

MiroMindAI / MiroThinker

MiroThinker is an open-source search agent model, built for tool-augmented reasoning and real-world information seeking, aiming to match the deep research experience of OpenAI Deep Research and Gem…

Python 4,512 288 Updated Jan 11, 2026

jonhoo / inferno

A Rust port of FlameGraph

Rust 2,015 146 Updated Dec 6, 2025

KDE / heaptrack

A heap memory profiler for Linux

C++ 3,926 237 Updated Jan 12, 2026

buyukakyuz / rustmm

Forked from rust-lang/rust

Rust without the borrow checker

Rust 258 5 Updated Jan 2, 2026

HarryR / z80ai

Z80-μLM is a 2-bit quantized language model small enough to run on an 8-bit Z80 processor. Train conversational models in Python, export them as CP/M .COM binaries, and chat with your vintage compu…

Python 932 36 Updated Jan 6, 2026

efficient / libcuckoo

A high-performance, concurrent hash table

C++ 1,726 289 Updated Mar 31, 2025

fast-pack / streamvbyte

Fast integer compression in C using the StreamVByte codec

C 412 43 Updated Feb 9, 2025

mikeroyal / AMX-Guide

Advanced Matrix Extensions (AMX) Guide

C++ 108 8 Updated Jan 11, 2022

tj / git-extras

GIT utilities -- repo summary, repl, changelog population, author commit percentages and more

Shell 17,947 1,225 Updated Dec 18, 2025

lelegard / arm-cpusysregs

Access Arm64 CPU system registers

C++ 113 13 Updated Dec 18, 2025

KUN1007 / kun-touchgal-next

TouchGAL是立足于分享快乐的一站式Galgame文化社区, 为Gal爱好者提供一片净土!

TypeScript 487 29 Updated Jan 12, 2026

ARM-software / kleidiai

This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai

C 111 17 Updated Dec 23, 2025

jax-ml / ml_dtypes

A stand-alone implementation of several NumPy dtype extensions used in machine learning.

C++ 325 54 Updated Jan 5, 2026

microsoft / T-MAC

Low-bit LLM inference on CPU/NPU with lookup table

C++ 908 74 Updated Jun 5, 2025

HanGuo97 / flute

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

C++ 380 18 Updated Apr 13, 2025

bytedance / sonic-cpp

A fast JSON serializing & deserializing library, accelerated by SIMD.

C++ 958 116 Updated Dec 26, 2025

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,514 702 Updated Jan 12, 2026

tzakharko / m4-sme-exploration

Exploring the scalable matrix extension of the Apple M4 processor

C 213 13 Updated Nov 7, 2024

elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine

Java 75,847 25,764 Updated Jan 12, 2026

BabitMF / bmf

Cross-platform, customizable multimedia/video processing framework. With strong GPU acceleration, heterogeneous design, multi-language support, easy to use, multi-framework compatible and high perf…

C++ 995 103 Updated Oct 31, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 4,616 389 Updated Jan 12, 2026

deepseek-ai / DeepSeek-OCR

Contexts Optical Compression

Python 21,982 2,002 Updated Oct 25, 2025

bytedance / monolith

A Lightweight Recommendation System

Python 9,049 693 Updated Oct 13, 2025

mukunoki / ozblas

C++ 3 1 Updated Aug 27, 2025

ARM-software / acle

Arm C Language Extensions (ACLE)

Python 118 68 Updated Dec 23, 2025

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,077 603 Updated Jan 10, 2026

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 3,412 286 Updated Jul 17, 2025

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,243 189 Updated Mar 27, 2024

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 22,290 4,022 Updated Jan 12, 2026