Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View ydtydr's full-sized avatar

Block or report ydtydr

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Autocomp: AI Code Optimizer for Tensor Accelerators

Python 56 Updated Dec 21, 2025

The best ChatGPT that $100 can buy.

Python 38,986 4,935 Updated Dec 9, 2025

GPT-Prompt-Hub is an open-source community-driven repository dedicated to the collection, sharing, and refinement of custom GPT prompts

2,255 391 Updated Aug 11, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,661 2,861 Updated Dec 21, 2025

Ongoing research training transformer models at scale

Python 34 32 Updated Dec 12, 2025

Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.

Python 140 29 Updated Dec 19, 2025
Python 58 32 Updated Dec 18, 2025

Unison file synchronizer

OCaml 5,003 259 Updated Dec 1, 2025

The tool facilitates debugging convergence issues and testing new algorithms and recipes for training LLMs using Nvidia libraries such as Transformer Engine, Megatron-LM, and NeMo.

Python 16 6 Updated Sep 17, 2025

Kimi K2 is the large language model series developed by Moonshot AI team

9,742 705 Updated Nov 7, 2025

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 961 82 Updated Sep 4, 2024

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,132 105 Updated Dec 21, 2025

Official implementation for Training LLMs with MXFP4

Python 115 16 Updated Apr 25, 2025
Python 15 6 Updated Dec 15, 2025

Muon is Scalable for LLM Training

1,386 78 Updated Aug 3, 2025

KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)

Jupyter Notebook 718 102 Updated Dec 19, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,210 85 Updated Aug 28, 2025

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Python 1,482 216 Updated Dec 15, 2025

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

Python 2,208 402 Updated Dec 15, 2025

Minimalistic large language model 3D-parallelism training

Python 2,374 260 Updated Dec 11, 2025

Video+code lecture on building nanoGPT from scratch

Python 4,620 725 Updated Aug 13, 2024

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,927 918 Updated Dec 15, 2025

Official inference framework for 1-bit LLMs

Python 24,461 1,914 Updated Jun 3, 2025

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,747 271 Updated Jul 18, 2025

The best OSS video generation models, created by Genmo

Python 3,538 468 Updated Nov 14, 2025

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, or on-prem).

Python 9,126 878 Updated Dec 21, 2025

real time face swap and one-click video deepfake with only a single image

Python 76,360 11,137 Updated Dec 15, 2025
Next