-
AMD, MooreThreads
- Shanghai
Stars
Efficient Triton Kernels for LLM Training
FlagGems is an operator library for large language models implemented in the Triton Language.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
ChatGPT CLI is a versatile tool for interacting with LLMs through OpenAI, Azure, and other popular providers like Perplexity AI and Llama. It supports prompt files, history tracking, and live data …
A client implementation for ChatGPT and Bing AI. Available as a Node.js module, REST API server, and CLI app.
User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment…
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Development repository for the Triton language and compiler
Open-Sora: Democratizing Efficient Video Production for All
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
C implementation of the L-Mul f32/f16 multiplications from paper: https://arxiv.org/html/2410.00907
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
An open-source computer vision framework to build and deploy apps in minutes
Data manipulation and transformation for audio signal processing, powered by PyTorch
Datasets, Transforms and Models specific to Computer Vision
PyTorch native quantization and sparsity for training and inference
MooreThreads / vllm_musa
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
NVIDIA curated collection of educational resources related to general purpose GPU programming.
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
MooreThreads / muAlg
Forked from NVIDIA/cubCooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
Diffusion model(SD,Flux,Wan,Qwen Image,...) inference in pure C/C++
Wan: Open and Advanced Large-Scale Video Generative Models
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations