Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View yangelvin's full-sized avatar

Block or report yangelvin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

ccint - a C/C++ interpreter, built on top of Clang and LLVM compiler infrastructure

C++ 67 14 Updated Jul 10, 2023

A novell, highly-optimized CUDA implementation of k-means algorithm.

Cuda 36 9 Updated Mar 3, 2022

Cache library and distributed caching server. Memcached compatible.

C++ 369 35 Updated Aug 7, 2024

Puzzles for learning Triton

Jupyter Notebook 2,089 170 Updated Nov 18, 2024

This project aims to collect the latest "call for reviewers" links from various top CS/ML/AI conferences/journals

1,015 42 Updated Oct 15, 2025

The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.

Python 10,485 786 Updated Nov 1, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 47,707 3,899 Updated Oct 31, 2025

Train transformer language models with reinforcement learning.

Python 16,099 2,261 Updated Nov 1, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,573 4,606 Updated Oct 31, 2025

Fully open data curation for reasoning models

Python 2,130 176 Updated Sep 3, 2025

为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…

Python 69,467 8,384 Updated Sep 20, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,929 285 Updated May 15, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,431 957 Updated Oct 24, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,671 971 Updated Oct 30, 2025

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

C++ 1,663 608 Updated Oct 31, 2025

Inference Llama 2 in one file of pure C

C 18,899 2,397 Updated Aug 6, 2024

LLM inference in C/C++

C++ 88,554 13,473 Updated Nov 1, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,841 896 Updated Sep 30, 2025

Efficient Deep Learning Systems course materials (HSE, YSDA)

Jupyter Notebook 916 139 Updated Apr 23, 2025

This is a list of useful libraries and resources for CUDA development.

588 46 Updated Oct 8, 2017

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 760 75 Updated Jun 30, 2025

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

C++ 4,983 764 Updated Feb 8, 2024

Building blocks for foundation models.

567 28 Updated Jan 3, 2024

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,705 1,506 Updated Oct 29, 2025

Solve puzzles. Learn CUDA.

Jupyter Notebook 11,597 888 Updated Sep 1, 2024

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,652 317 Updated Aug 19, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,258 820 Updated Oct 17, 2025

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 9,962 1,659 Updated Oct 31, 2025

LLVM (Low Level Virtual Machine) Guide. Learn all about the compiler infrastructure, which is designed for compile-time, link-time, run-time, and "idle-time" optimization of programs. Originally im…

C++ 186 10 Updated Jan 4, 2024
Next